Combining automotive radar and LiDAR for surface detection in adverse conditions

Engineering and Physical Sciences Research Council, Grant/Award Numbers: EP/N012402/1, EP/ S000631/1; Jaguar Land Rover, Grant/Award Number: EP/N012402/1; MOD University Defence Research Collaboration (UDRC) in Signal Processing Abstract Automotive radar and light detection and ranging (LiDAR) sensors have complementary strengths and weaknesses for 3D surface mapping. We present a method using Markov chain Monte Carlo sampling to recover surface returns from full‐wave longitudinal signals that takes advantage of the high spatial resolution of the LiDAR in range, azimuth and elevation together with the ability of the radar to penetrate obscuring media. The approach is demonstrated using both simulated and real data from an automotive system.


| INTRODUCTION
We report a new approach to combine frequency-modulated continuous wave (FMCW) automotive radar with full waveform pulsed light detection and ranging (LiDAR) data to detect surface returns with high spatial resolution in all three dimensions. This low-level data fusion allows us to take advantage of the weatherpenetrating properties of the automotive radar without sacrificing the detailed mapping capability of the LiDAR. Our method relies on Markov Chain Monte Carlo (MCMC) analysis of a likelihood function based on the returned LiDAR and radar signals and hence is able to detect a peak return from an impinged surface within the high spatial resolution of the LiDAR footprint.

| BACKGROUND
Autonomous vehicles and advanced driver assistance systems require robust and resilient sensing for situational awareness, to simultaneously locate their position and map the surrounding environment, to detect the presence of road users including vehicles, pedestrians and other entities such as surrounding buildings, barriers and other traffic hazards. To that end, automotive sensing systems carry a multimodal suite of sensing technologies, of which passive optical camera, LiDAR and radar systems are the most informative [1].
Each technology has particular strengths and weaknesses. LiDAR is a mature technology that has found wide application in the automotive sector [2,3]. Many manufacturers supply systems capable of capturing more than one million three dimensional points per second, have a very good azimuthal resolution, typically 0.1 o , and resolve in elevation by supplementing horizontal scanning with multiple vertical beams. As such, LiDAR systems are very effective in detailed surface mapping during the day and at night due to this very accurate, active light projection. The surface sampling density can be further improved by frame to frame data accumulation [4]. In contrast, passive optical camera technology is very effective in providing data for object classification [5] due in part to the rich reflectance data that is provided from the surface using a broadband source, and in part to the massive research effort that has gone into such classifiers with the advent of deep learning and massive GPU clusters. Camera data has also been combined with LiDAR data to good effect to further improve vehicle detection in dense urban environments [6] by marrying shape with reflectance signatures.
Automotive radar systems [7] are used widely to detect and measure the relative positions and velocities of other vehicles and objects in the immediate vicinity. Radar has a poor 3D resolution in comparison with optical technologies and is usually restricted to range-azimuth data only. Typically, the range resolution is of the order of 20 cm and the azimuthal resolution 1−2 o , due to the large angular beam divergence. This makes it very difficult to provide detailed scene mapping or reliable object recognition in comparison with optical technology. In practice, the majority of object detection or classification systems rely on Doppler signatures [8,9]; this is problematic if the vehicle, pedestrian, bicycle or other object is stationary. However, radar technologies succeed where optical technologies fail in adverse weather, such as snow [10], rain [11] and fog [12].
Hence, in adverse conditions, we would like to retain the detailed surface resolution of a LiDAR system with the capability of an automotive radar to provide robust data as the weather deteriorates. LiDAR surface mapping depends on the accurate detection of surface returns, a process that is severely degraded in the presence of obscuring media. In this article, our contribution is a new algorithmic approach to combine longitudinal (range) laser and radar reflections to find robustly the true surface reflection in adverse conditions. This requires full waveform analysis in which the properties of the obscuring media are modelled [13]. The position in the longitudinal return defines the range; the azimuthal and elevation coordinates are defined by the pointing angle of the LiDAR signal, as is always the case.

| DETECTION OF SURFACE RETURNS THROUGH OBSCURING MEDIA USING MCMC
An example of data collected near Edinburgh on a foggy day by our trial vehicle ( Figure 1) is shown in Figure 2. Although, the radar image, in which two cars are visible, provides data to 60 m in this instance, the optical sensors lose the second car, and the LiDAR fails to detect surfaces such as the road beyond a few metres. The problem is that accurate, automotive LiDAR systems require strong, identifiable peaks in the return signal that can be detected using simple techniques such as crosscorrelation or constant false alarm rate (CFAR) detection. These methods are not applicable through obscuring media.
When there is an obscuring medium, as illustrated for LiDAR data in Figure 3, reproduced from [14], the surface return can be 'invisible' beyond, or 'lost' within the returns from the intervening medium, and so very difficult to detect. As explained in [14], if we consider the propagation distance of photons between independent scattering events within the medium as exponentially distributed, the shape of the returned histogram of photon events is a gamma distribution. These F I G U R E 1 The sensors suite on the trial vehicle. From left to right are shown a Velodyne HDL-32E, 905nm 32 channel LiDAR, a Zed Stereo Camera pair and a Navtech-CIR104 79GHz radar system F I G U R E 2 Frames from camera, radar and LiDAR sequences acquired by the vehicle of Figure 1. These were taken on the A702 near Edinburgh in fog. The LiDAR is very restricted in range and fails to detect returns from the approaching car or the road beyond a few metres, and the camera image is also obscured by fog. In contrast, the radar detects the two vehicles and other surface returns observations are consistent with other experimental studies reported in Pfennigbauer et al. [15] and Tobin et al. [16].
Hence, we address the problem of detection of surface returns, or peaks, within and beyond the obscuring medium. In many respects, the problem is similar to the problem of imaging through a tree canopy [17], but here, the distributed returns come from the continuous, obscuring medium rather than from the tree foliage and bark. The problem may be simpler in a dense medium because the medium is then homogeneous (see [15]) which makes the gamma model more accurate. If the medium is less dense, then there are inhomogeneities that result in intervening signal peaks that are not real surface returns, evident in Figure 3.

| The LiDAR system model
To be effective in adverse conditions, a full-waveform LiDAR signal return is required. To acquire such data, we can employ linear mode avalanche photodiodes or single-photon avalanche diode (SPAD) detectors [18] in time-correlated single-photon counting (TCSPC) mode. The latter approach provides high sensitivity and excellent depth resolution at low laser power levels, but for application in the automotive environment the technique has suffered from long data acquisition times. However, linear SPAD arrays are now being employed in automotive systems using mechanical azimuthal scanning, and new developments in arrayed technology coupled with fast algorithms [19] and parallel processing [20] promise full 3D LiDAR cameras in the near future. Figure 4 shows a fitted mathematical model to a strong, instrumental signal from a full-wave, TCSPC-SPAD system [21,22]. We validated this mathematical model as the piecewise function of Equation (1) with a central, Gaussian peak, and an exponential rise and fall on either side (with different time constants), a model consistent with the physical characterisation of such semiconductor devices [23]. The signal model is defined by: where i is the time or channel index, and the parameter vector . For a given LiDAR return, the amplitude, β, and position, t 0 , are the unknown parameters of interest and (σ, t 1 , t 2 , t 3 .τ 1 , τ 2 , τ 3 ) define the fixed shape parameters of the known system response. t 1 , t 2 and t 3 are the breakpoints for the piecewise exponential. The fixed shape parameters are determined experimentally from a fit using an optimisation procedure to minimise the error between the model and the real, instrumental data.
We have continued to use this model and confirm its validity; for example in recent work we have shown that it is possible to train a neural network to recognise real returns in TCPSC systems using data simulated by that model [24].

| Modelling the obscuring distribution
In addition to the LiDAR system model, we also require a validated, mathematical model of the intervening medium, as for example the fog illustrated in Figure 2.
In the first instance, our colleagues conducted many measurements of targets through different obscuring media such as water fog, glycol-based vapour, and incendiary smoke [16]. These measurements showed sparser photon counts through increasingly dense media, and if aggregated over several measurements showed an exponential decay with F I G U R E 3 Example of a surface target imaged through thick fog (©2018 IEEE). The histogram shows the real data. The fitted curve (in green) shows the summation of returns from the fitted gamma distribution and the single, Gaussian surface return [14] F I G U R E 4 An example of a fitted, full wave, instrumental return from a TCSPC-SPAD LiDAR system. The real data is shown in blue, the parameterised model of Equation (1) is shown in red. The time axis is set at 2.44ps intervals for each storage bin, with index i distance, but as the data was time-gated at a pre-determined distance, this could not validate the near field response. In contrast, Pfenigbauer et al. [15] and Satat et al. [14] did measure the returns from full-wave LiDAR systems through the range from the near to the far field, as shown for example in Figure 3. Both these studies show that the underlying gamma model is a good representation for the returns from a homogeneous, foggy medium, but that if the fog is inhomogeneous, then this model is less accurate.

| Analysing the LiDAR data
The observed LiDAR signal, y ¼ fy 1 ; y 2 ; …y t max g, for k surface returns is a sample of a non-normalized statistical mixture distribution [21] with density: where ϕ = {β j , t 0j } is the set of parameters for each signal whose instrumental function is known, β fog f fog models the gamma distribution with known or measured parameters that is in addition to the known, assumed constant, background B bg . The amplitude, β j , and position, t 0j , of each of the k surface returns is unknown. Assuming a discrete time (distance) histogram as signal representation, the value y i recorded in each channel can be considered as a random sample from a Poisson distribution: Assuming that the measurements in each channel i are conditionally independent given the value of the parameters, the joint probability distribution of y is The data points, y i are fixed and negative values of (y|k, ϕ) are not possible, so we maximise an equivalent logarithmic likelihood function, LL(y|k, ϕ), as is common practice [21]: We make a number of simplifying assumptions to make the problem more tractable. First we assume B bg is known and constant, a not unreasonable assumption as this is relatively stable. Second, we assume that k is known. The LiDAR system of Figure 1 has a single or dual pulse mode, k = 1, 2, so that it is consistent with our real data. Third, we assume the parameters of the fog gamma distribution are known. This is the most difficult assumption; while one can fit a gamma distribution to fog data [15,14] this is not likely to be stable, and non-homogeneous regions lead to additional peaks of unknown shape in the signal return, as in Figure 3.
Hence, we wish to infer the positions and amplitudes of one or more surface, not fog, returns in which the position indicates the range and the amplitude is proportional to the optical reflectance of each impinged surface. The azimuthal and elevation coordinates are defined by the beam pointing angle. We make no prior assumption on position, t 0j , drawn from a uniform distribution in the interval [0, i max ], or amplitude, also drawn from a uniform distribution within the sensor dynamic range. We construct a Markov chain whose transitions involve changes to those positions and amplitudes, using a delayed rejection step for the positional update, and a gamma distribution for the amplitude updates with scale parameter depending on the current value of the Markov chain, as described in detail in [21]. After burn-in, we draw our estimates of the signal parameters from the median or mean values of those chains.

| Incorporation of the radar data
In general, our experimental hypothesis is that true LiDAR surface returns of low amplitude will be difficult to detect in dense fog, as the majority of the signal return will be from multiple reflections within the medium itself. Therefore, we use the concurrent radar data to provide an additional likelihood term to aid the detection of such returns, on the assumption that there should be a coincidence of time of arrival of such signals in the LiDAR and radar sensors. However, the physical phenomena of such returns are quite different, and the radar signal, in particular, is difficult to model, such that we do not anticipate an exact physical relationship. As stated above, the range resolution of the radar employed in our work is approximately 17.5 cm, compared to 2 cm for the LiDAR system, and the azimuthal resolution is of the order of 1.8 o which equates to approximately 63 cm at 20 m, in comparison to 4 cm for the LiDAR. Therefore, we stress that the key objective in processing the LiDAR data using concurrent radar, rather than processing simply the radar itself, is to improve azimuthal and elevation resolution, although an improvement in range resolution is a bonus.
We consider the radar measurement as independent and incorporate that measurement in the joint likelihood formulation to obtain:

-
where F r (t 0j ) defines a term arising from the radar signal at the current proposed surface location t 0j . The radar signal is relatively unaffected by fog, and one can again assume a constant background level; so we have incorporated the radar data in two ways. First, we use the weighted, exponential of the power signal (measured in dB), F r ðt 0j Þ ¼ e λP r ðt 0j Þ as a measure of likelihood; this has the advantage of avoiding a strong prior commitment to any region of the received signal. Second, we employ a weighted, signal detection algorithm (CFAR) to the radar signal, F r ðt 0j Þ ¼ e λγðP r ;t 0j Þ so that γ represents a binary variable, (detection, no detection in channel, t 0j of the radar signal) that is normalised to sum to one across the signal range. While this encourages the MCMC chain to explore those regions with evidence of detection, there is the chance of nongraceful degradation if a threshold is applied. The azimuthal and elevation beam spreads of the radar data are much higher and the physical phenomena are quite different, but in most cases a LiDAR return should have a coincident (in time, range) radar signal, and the MCMC process filters the true from the false returns using both signals.

| Simulation
In the first experiment, we simulate a LiDAR signal with three peaks at arbitrary positions in the range 0-100 m as shown in Figure 5, having amplitudes 8, 4 and 2 counts at positions 8.68, 26.04 and 52.08 meters respectively. We use the LiDAR signal to generate a simulated radar signal as shown in Figure 6, in which the signal return widths and noise levels are consistent with the real radar data, shown in Figure 12. We apply the MCMC process described in Section 4 on the basis of the log-likelihood functions of Equations (6) and (5), with and without consideration of the concurrent radar data respectively. The weighting λ included in the definition of F r (t 0j ) below Equation (6) can be selected to represent the relative importance of the LiDAR and radar returns, for example dependent on fog density. In these experiments it has been set as LL from Equation (5), that is, it is weighted adaptively to the strength of the LiDAR loglikelihood function. Figures 7 and 8 show histograms of the position of the detected peaks in each case. We generated different random signals for each of 100 separate trials, with a burn-in of 5000 and a total number of 20,000 iterations, although these parameters were not critical. The first conclusion is that the peak detection using the concurrent radar data is much more successful, as the peaks are clustered about the true positions, although not all members of a cluster in Figure 7 can be classed as true detections as some are far from the true value, In Table 1 we use an arbitrary cut off of 0.25 m to define a true detection, and note that less than half are within that tight criterion even with radar assistance. The second point is that the result of an MCMC process finds a set number of returns -363 (three), but any given chain may find more than one peak at or around a true location. Allowing a more tolerant definition of a true return, within 1 m, and again referring to Table 1, in only 38% of 100 trials did the algorithm find returns in each of the three clusters, and in some cases these might be more than 0.25 m from the true locations. For the data without radar, there are many more complete outliers, and as can be seen in Table 1 it is much less likely to find all three returns and indeed failed to find any on 3% of occasions.
Of course, one could assert that in the simulation the three radar peaks are rather distinct, and one does not need LiDAR data. However, this misses the point that the LiDAR data is accurate in cross-range, whereas the radar signal does not define well the direction of the target. As we shall see in Section 5.2, for the real data there are also several peaks in the radar signal that are not coincident in range with the LiDAR direction of interest, and should thus not be construed as real surface returns.

| Using real data
We have collected several hours of concurrent camera, LiDAR and radar data [12] from the vehicle illustrated in Figure 1 which includes 3 hours of annotated radar images with more than 200K labelled road actors in total in a variety of weather conditions (e.g., sun, night, rain, fog and snow) and driving scenarios. This dataset was collected between February 2019 and February 2020, using the Robot Operating System, and is available for download at http://pro.hw.ac.uk/radiate/. This can be used to replicate some of the results shown here, or for other work on scene reconstruction, object recognition and fusion.
The ZED stereo camera is protected by a waterproof housing for extreme weather and is set at 672 � 376 image resolution at 15Hz. The LiDAR is a 32 channel, 10Hz, Velodyne HDL-32e giving 360 • coverage. The radar is a 4Hz Navtech CTS350-X which provides 360 • high-resolution range-azimuth images with 100 metres maximum operating  Figures 2 and 11. The FMCW radar data is fully compatible with our signal model. As stated, no current automotive LiDAR system allows full-wave operation, so we operated the Velodyne HDL-32E in single or dual pulse mode. However, the gamma model for returns from a homogeneous obscurant has been validated as discussed in Section 4. Therefore, we believe it is justifiable to use this real LiDAR (single peak mode) data to determine the position of the surface return, and use that knowledge of the published work [14][15][16] to simulate the addition of fog to the real LiDAR returns, as representative of what we would expect from a full-wave automotive LiDAR. Figures 9-11 show examples of data collected on a field trip. In the case of the LiDAR image, the point cloud is cumulative, acquired from many scans at the vehicle moves, as the sampling in a single scan is relatively sparse in the vertical direction (32 channels). We have calibration parameters for the different sensors which operate at different data acquisition speeds, and we have fine-tuned the registration using feature matching.
We perform signal analysis along the forward direction, as shown in Figure 11. Figure 12 illustrates the radar power signal and the CFAR detection threshold that is used to provide the alternative binary waveform used in Equation (6). Although Figure 10 shows many points, each channel only detects a single surface (or another) return on each laser pulse, and that is the raw data that is of interest here, as opposed to LiDAR processing strategies that use additional spatial or temporal constraints within or between image pixels and frames to build complete images [25,26]. Of course, any such strategy is valuable given appropriate constraints, but we consider now only single element range signal analysis along the optical line of sight and do not consider full surface reconstruction as that is out of scope. Clearly, methods that make use of local neighbourhoods or compressive sensing can be combined with the analysis here. Figure 13 shows a full waveform generated from the LiDAR point cloud, one of the many hits at the wall at 19.44 metres, subject to the effects of obscurant described in Section 4.2. We do not have a magnitude value as the Velodyne  -365 data is pre-detected so we choose a value of 2 counts, similar to the simulated case. We generate multiple, random LiDAR signals of the type shown in Figure 13, and as with the simulated data, run multiple trials using the real LiDAR locations and the concurrent real radar data. Figures 14 and 15 show the positions of the detected signal with and without radar, and Table 2 tabulates the mean values for both position and amplitude, together with the standard deviation. We used LiDAR returns from the roof at 19.44 metres and the chimney at 25.17 metres as examples. We used the two forms of the radar term in Equation (6), first the power spectrum for returns at 19.44 metres, second, the CFAR detections for returns at 25.17 metres, with reference to Equation (6) and Figure (12). The question is whether one considers the strength of the signal or the simple presence of a detected return as the most relevant information provided by the radar. In the particular example shown, radar returns from the wall at 19.44 metres are stronger than those from the roof and chimney at 25.17 metres, although both are detected by the CFAR algorithm.
From Table 2 there is much less difference in the performance with and without radar when the LiDAR return strength is relatively high (8 counts), although there are more outliers in detection without radar. Provided the modelling of the instrumental, ambient and fog are appropriate, the MCMC process should find the return in almost all cases. When the return level drops to 2 counts (first column), that is the fog makes penetration by LiDAR more difficult, then there is a significant improvement when using the radar data, and this is also true when examining the 25.17 (4) case. When the return level at 25.17 metres drops below 4 counts, then there are F I G U R E 1 2 Radar power spectrum for Navtech image captured at the location shown in Figure 9. The threshold for CFAR detection (guard cells = 10, training cells = 300 false alarm rate = 0.1) is also shown F I G U R E 1 3 LiDAR magnitude spectrum corresponding to Navtech image. This uses real 3D data but generates waveform synthetically. The LiDAR return is at 19.44 metres F I G U R E 1 4 Histogram of signal detections with radar waveform 366many more errors as the Markov chain uses the delayed rejection mechanism to explore possible returns in the area of dense fog returns, where the Poisson noise levels are higher. A fuller comparison of the 19.44 (2) case is possible from Figures 14 and 15 in which one can ascertain the number of returns within any desired limit of the true value.
As with the simulated data, this allows a comparison of how many detections can be classed as 'true', that is, within a specified distance of the true value, and this shows much more clearly that although the mean value without radar may not be so far from the true value, in practice there are many false detections. To give a better idea of how the recovered signal appears, Figure 16 shows details of the fitting response to the signal with and without radar, in which the values with and without radar are both close to the true value as is usually the case when the amplitude is 8 counts.
However, the addition of fog in this experiment has assumed the medium was homogeneous, modelled by a gamma function. One of the key observations from Figure 3 and other work [15] is that if the medium is inhomogeneous, then the gamma function is only approximate. Figure 17 shows an example of what happens in this case, as the fog has now been simulated by a gamma function multiplied by a decaying sinusoid to model variation, but this is not known or encoded in Equation (2) which has only a gamma model. If the radar data is not used, the surface return is found incorrectly and invariably around 5 m, as the gamma distribution does not fit the inhomogeneous data well, so that the denser medium at 5 m is interpreted as a real surface return. However, given radar data in which the received power is low at 5 metres, then the surface return is correctly located.  -367

| CONCLUSIONS
We have presented a method that combines LiDAR and radar signals to find 3D point data when affected by an intervening medium such as fog. The key advantage of combining the LiDAR and radar data, compared to the use of radar data alone, is that the surface is located much more exactly in azimuth and elevation by the LiDAR pointing direction, with an additional improvement in range resolution. The key advantage of combining the LiDAR and radar data compared to the use of LiDAR data alone is that the signal is more likely to be located within the medium, and is less likely to respond to outliers generated by denser fog patches. This improves the likelihood of surface detection and narrows the spread of the depth values, as shown for example in Figure 14 compared to Figure 15. Our approach derives from our earlier work on MCMC exploration of the solutionspace [21]. We have introduced a distribution function to model the obscuring medium, validated by several laboratory trials, and have encoded the likelihood function using both radar and LiDAR signal data. In the absence of a complete physical model for the radar, the radar signal has been used in two ways, first by allowing the power received to improve the likelihood of a surface return proportionately, and second by using a CFAR detection process to improve the likelihood as a binary decision. Our analysis has been applied to simulations and real data collected in the wild by our vehicle. In the latter case, we have simulated the obscuring medium to specify equivalent full-wave LiDAR returns, and building on the laboratory trials, this gives us confidence that the approach has promise.
There is considerable room for future work to improve this study. First, the use of simulation of the obscuring medium was necessary because no suitable benchmark data exists, and current automotive LiDARS with full scanning do not have full-wave capability. We are working on the procurement and installation of full-wave automotive radar and LiDAR systems. Second, the radar function model does not fully model the physical processes in data formation. Full modelling of radar signal returns in a road network is especially difficult, but we would hope to go further in making a more realistic likelihood function based on scene modelling as well as the respective beam geometries. Third, and given that a single radar diverging beam encloses many LiDAR beam directions, we should improve scene reconstruction by taking better account of the respective beam patterns and incorporating spatial constraints. Working with a singlesource, obscured LiDAR data from our lab trials, excellent depth reconstruction was possible with gated LiDAR data [16]. Recent work on accelerated processing [27,20] of LiDAR alone has led to processing times as low as 3 ms for a frame size of 200 x 200 pixels on standard hardware. This suggests that future work should also examine alternative processing strategies to the MCMC exploration of the solution space to design algorithms that meet the real-time requirements for automotive sensing.