Inversion probability enhancement of all-fiber CDWL by noise modeling and robust fitting

Accurate power spectrum analysis of weak backscattered signals are the primary constraint in long-distance coherent Doppler wind lidar (CDWL) applications. To study the atmospheric boundary layer, an all-fiber CDWL with 300µJ pulse energy is developed. In principle, the coherent detection method can approach the quantum limit sensitivity if the noise in the photodetector output is dominated by the shot noise of the local oscillator. In practice, however, abnormal power spectra occur randomly, resulting in error estimation and low inversion probability. This phenomenon is theoretically analyzed and shown to be due to the leakage of a time-varying DC noise of the balanced detector. Thus, a correction algorithm with accurate noise modeling is proposed and demonstrated. The accuracy of radial velocity, carrier-to-noise ratio (CNR), and spectral width are improved. In wind profiling process, a robust sine-wave fitting algorithm with data quality control is adopted in the velocity-azimuth display (VAD) scanning detection. Finally, in 5-day continuous wind detection, the inversion probability is tremendously enhanced. As an example, it is increased from 8.6% to 52.1% at the height of 4 km.

introduce time-varying noise to the spectra as the amplifier chain comes out of saturation. Some commercial lidars also suffered time-varying noise behavior but the underlying mechanism may be different [31][32][33]. Researchers have tried to correct the signal with different noise modeling.
Accurate noise modeling is helpful in improving the inversion probability of radial wind velocity, especially in the weak signal regime. Spectra calculated from the tail of the return are typically used to determine the noise background. Noise modeling methods applied to it, such as polynomial fitting [34] and linear predictive coding [35], are used in the spectral data processing. Background correction algorithms directly performed on the signal-to-noise ratio are also proposed to correct artifacts in the background noise outputted by the instrument [32].
In this study, an all-fiber CDWL with pulse energy of 300 µJ has been developed for long-range wind and aerosol detection. The lidar works in VAD scanning mode for wind profile detection. The raw CNR and spectral width detected by the lidar suffer from unknown interference, resulting in estimate errors in the CNR, spectral width and Doppler shift. Theoretical analysis and correction algorithm are proposed to solve this problem. A robust sine-wave fitting algorithm is proposed for wind retrieval. Continuous observation results of wind profiles are also presented. Finally, statistics are drawn to see the enhancement of inversion probability.

Instrument
The all-fiber coherent Doppler lidar demonstrated here is developed for tropospheric wind and aerosol detection. The optical layout is exactly as the one we reported in previous works [17].
The key parameters of the system are listed in Table 1. The 1.5 µm laser is applied in the lidar, as it shows the highest maximum permissible exposure to human eyes in the wavelength ranging from 0.3 to 10 µm [36,37]. The laser system manufactured by Keopsys adopts a master oscillator power amplifier (MOPA) design. The continuous-wave seed laser is split into two parts, one as the local oscillator (LO) light, the other is modulated into pulse trains with 80 MHz frequency shift and 10k repetition rate by an acousto-optic modulator (AOM). The pulses are pre-amplified by the master laser and then send to the erbium-doped fiber amplifier (EDFA). The pulse Extinction Ratio on the master laser pulsed port is larger than 80 dB. In the EDFA, the stimulated Brillouin scattering (SBS) may damage the amplifier and limit the peak output power. The temperature and strain can change the peak position of the SBS spectrum [38,39]. Thus, by introducing different temperature or strain distribution to the erbium-doped fiber, the power threshold of SBS can be increased. After three stage amplifiers, the output pulse energy reaches to 300 µJ.
The EDFA output fiber provided by the manufacture is a 25-µm-diameter core, panda, large mode area fiber with a pigtail length of 35 cm. The fiber end is fixed at the focal point of an aspheric lens with a diameter of 100 mm. The laser pulses are collimated and transmitted to the atmosphere through the fiber-optic coupled telescope. The transmitter/receiver switch is realized by a fiber circulator integrated into the EDFA. An optical switch (OS) with 30 dB isolation ratio is utilized after the circulator to increase the optical isolation of strong specular reflection and avoid the saturation of the detector and electrical amplifier. The specular reflection may come from three places, the circulator, the end face of output fiber, and the telescope lens. According to our calculation and test, the reflection from the telescope dominates. Note that the power of specular reflection after isolation is still much strong than that of the atmospheric backscattered signal. After passing through the circulator and optical switch, the received backscattered signal is mixed with the LO light and then detected by a balanced detector. Controlling software is developed to achieve real-time data processing and unattended operation.
In the data processing stage, the sampled signal is divided into 200 range gates before calculating the spectra. The first 100 range gates are set as 75 samples per gate with overlap of 25 samples. The next 50 range gates include 100 samples per gate with no overlap, and for the last 50 gates, the samples are 250 with no overlap. The reduced data resolution along the detection range is used to guarantee the detection probability in the far-field where the signal is weak. The FFT algorithm is then performed and the estimated power spectra are saved as raw data. All spectra are calculated with a 512-point FFT, which is implemented by zero-padding to the samples to reduce the 'picket fence' effect. The instrument also outputs parameters including the signal intensity (wideband CNR), radial velocity and spectral width.
During the experiments, the lidars work in VAD scanning mode for wind profile detection. The instruments are placed on the campus of the University of Science and Technology of China (31.83°N, 117.25°E), as shown in Fig. 1. The azimuth scanning range is set as 0-300°and the elevation angle is 70°. The scanning interval is 5°and a total of 60 radial profiles are obtained for each scanning circle, lasting 135 s.  Figure 2 gives an example of the radial measurement results. The results are directly outputted by the lidar without any correction. Vertical streaks can be seen in the time-height plot of both CNR and spectral width. Obviously, these streaks are not atmospheric backscattered signals but some unidentified interference. Such interference in CNR will influence the accuracy of other retrieved results based on it. Such as the Cramer-Rao Lower Bound (CRLB) that is often used to calculate the uncertainty of velocity estimation [40]. The spectral width suffers more serious problems. One third of the results under 3 km is covered by some red fence-like interference signal. The spectral width is an important parameter for scientific researches. For example, it has been used to identify precipitation events, where double-peak spectra are observed [22]. Besides, it is also regarded as an indicator of turbulence intensity. In the next section, we will analyze the source of the noise and propose a correction algorithm.

Noise identification
In the coherent receiver, the return signal is mixed with a continuous wave local oscillator beam. A balanced detector is usually used to detect the mixed-signal, which can achieve higher suppression on the relative intensity noise (RIN) of the LO light [41]. The output photocurrent can be expressed as [42] i where the first term is the intermediate frequency (IF) photocurrent, i.e. signal current. The second term is the noise current, which is dominated by the shot noise of the LO light for a well-designed coherent receiver. This noise is often modeled as a zero-mean Gaussian random process. The last term is the baseband current, which should be zero for a perfect balanced detector. The three terms are independent of each other. The periodogram calculated by the FFT algorithm is utilized to estimate the power spectrum. Incoherent accumulation is used to improve detection probability and velocity estimation accuracy. The accumulated spectra can be express as where the three terms on the right side are power spectra corresponding to the three photocurrents in Eq. (1), respectively. S s (f , t) is a narrowband signal spectrum and central at 80 MHz if the Doppler shift of the target is zero. S n (f , t) is a broadband noise. The baseband current spectrum S DC (f , t) locates at 0 MHz. However, once there is a baseband current leakage, sidelobes are introduced in the spectrum due to the 'window effect' of FFT. The rectangular window is used here because of its advantage of narrow main lobe, but it also suffers from high sidelobe intensity. These sidelobes can even extend to the range of Doppler shifted signal. It can cause serious interference on identifying signal peaks and result in errors during estimating the velocity, CNR, spectral width and other parameters. Here, we call the interference due to sidelobes as DC leakage induced noise. An example of the accumulated power spectra outputted by the lidar is shown in Fig. 3. The jump of the noise level at the 101st and 151st range gate is a result of changed data resolution.
The large signal around the 20th range gate (in red color) corresponds to the specular reflection from the output optics. The signals after that are atmosphere backscattered signals. The baseband current spectrum S DC (f , t) can be seen clearly. Its intensity varies quickly in the range-frequency space.

Noise correction algorithm
In order to improve the inversion probability, we tried to eliminate the DC noise by post-processing the raw power spectra outputted by the lidar. The workflow of correction algorithm is shown in Fig. 4. Firstly, normalized curve models of the noiseŜ n (f ) andŜ DC (f ) are obtained. The curve of background noise is obtained by the raw power spectra when there are no atmospheric signal and DC noise. Because the optical switch is also set to off at the range gates in front of the specular reflection signal, noise can be obtained from these gates. Ten-minute accumulation is performed to smooth the random fluctuations on the curve, as shown in Fig. 5(a). Three curves obtained at different times are given for comparison. The two curves below are downshifted 1 dB one after the other. The non-flat shape of the curve is mainly determined by the gain characteristics of the electrical amplifier integrated into the detector, and it keeps unchanged for a long time. The normalized curve model of the DC noise can be calculated theoretically. We assume that the baseband current i DC (t) remains constant at each range gate for a single pulse shot. We know that the Fourier transform of a continuous rectangular function is a sinc function. But due to the discrete sampling, the result of the discrete Fourier transform is a little different. As shown in Fig. 5(b), the spectrumŜ DC (f ) is calculated, where M is the number of samples in a range gate. Secondly, the level of the background noise S n (f , t) is determined for each accumulated power spectrum. There are no available atmospheric signals in the range gates before specular reflection. The spectra in those range gates are averaged and then used to determine the background noise level by minimizing the following objective function where a and b are the coefficients to be solved. f M = 80 MHz is the preset frequency shift of the transmitted laser. B w is the searching bandwidth. The background noise S n (f , t) is then given by the variation of S n (f , t) in different range gates is ignored here. Due to different data resolution, the background noise level will be multiplied by a factor of 100/75 and 250/75 at the range gates between 101-150th and 151-200th, respectively. Thirdly, the level of DC noise S DC (f , t) is calculated for each range gate. Due to the fixed relation between the main lobe and sidelobes, the intensity of main lobe is enough to determine the curve of DC noise.
where t i corresponds to the i-th range gate. Finally, the two kinds of noise are corrected. Note that there are both multiplicative and additive effects impacting the spectra. The DC noise is removed from the raw spectra via subtraction, while the background noise is processed by division (i.e. noise whitening). Then the velocity, wideband CNR and spectral width are calculated based on the one-peak Gaussian fitting method. The wideband CNR is calculated by the ratio of total signal power to noise power over the searching bandwidth. The spectral width is estimated by the ratio of total signal power to the peak power value. Examples of the whitened spectra are given in Fig. 6. Before correction to the DC noise, the signal peak is surrounded by fluctuations. The superposition of them results in a deviation to the signal peak position which is used to determine the radial velocity. Besides, the signal power (or CNR) will be overestimated due to the appended noise peak. In Fig. 6(b), the overlap of signal peak and one noise peak results in a fake 'two-peak signal' as often appears during rainy conditions [22]. As a result, the calculated spectral width will be much larger than the actual value. After the correction, those problems are resolved. The detection probability and data availability are also improved.

Performance validation
In order to validate the performance of the correction method, we applied the algorithm to continuous dataset. The corrected results of December 20, 2019, are plotted again in Fig. 7. One can see that, in the time-height plot of both CNR and spectral width, the interference of vertical streaks disappears. Several thin aerosol layers can be found at the height about 2-5 km.  Figure 8 shows results comparison of CNR, spectral width and radial velocity before and after the correction at 10:16 am. After correction, the CNR decreases because the contribution from the intermittent noise is removed, as shown in the range of 0.5-2 km in Fig. 8(a). The overestimate of the spectral width is corrected as shown in Fig. 8(b). It's noted that the relatively large fluctuation of the spectral width around 3 km is due to large estimation errors at low CNR. The correction also improves the accuracy of radial wind estimation.
The statistical results of corrected and uncorrected data on December 20, 2019, are given in Fig. 9. The histograms of background CNR are shown in Fig. 9(a). Compared with uncorrected data, the corrected background CNR is centered at zero, and the distribution width is narrower. The spectral width is analyzed for the data below 3 km to avoid the spectrum broaden in clouds. Besides, in order to reduce the influence of large estimate errors that happened in the weak signal regime, only the data with CNR larger than −25 dB is included in the statistics. As shown in Fig. 9(b), The overestimate of spectral width is corrected. The statistical results show that the correction algorithm eliminated the DC leakage induced interference.

Wind vector retrieval method
A robust sine wave fitting (RSWF) method is proposed to retrieval the wind vector, which improves the inversion probability in the weak signal regime. The ordinary least-square method (usually referenced as direct sine wave fitting, DSWF) is sensitive to the outliers that often occur at low CNR. The fitting results can be significantly affected by those outliers. A solution is to screen the estimated radial velocities by a CNR threshold before applying the fitting [43,44]. However, the accuracy of the CNR estimate in the weak signal regime is limited and a specific threshold cannot separate reliable and unreliable data automatically. A higher threshold will result in a loss of more reliable data. Filtered sine wave fitting (FSWF) method shows good performance [45,46]. It weights the contribution of different radial wind velocities to the wind vector with an exponential function. But there is no analytical solution to the method and global optimization is computationally costly. Researchers also proposed a method that reweights the contribution of different radial wind velocities by the residual of the fitting results, but the reliability of radial velocity determined by CNR is not considered. Thus, the method cannot work under severe turbulence even with high CNR [47].
Here, we propose a robust sine wave fitting method that weights the contribution with a combination of CNR and fitting residual. The objective function is given as where V i is the radial wind velocity, S i is the unit vector directed along the probing beam axis, w i is the weight coefficient, σ 2 i is the instrumental measurement precision, V is the wind vector to be calculated. The actual instrument precision can be estimated from Doppler lidar measurement [48,49]. Here, for simplicity, the theoretical variance of the Doppler estimate is used [50] where F s is the sampling frequency, N is the number of accumulated laser shots, ∆f is normalized spectral width. Before retrieving the wind vector, a low CNR threshold C 1 = −35dB is given to screen the data. Most unreliable velocities are removed and the majority of reliable velocities remain. Then the following iterative procedure is performed.
1. The initial value of w i (t=0) = 1 is given, and the initial wind vectorV (t=0) is calculated bŷ 2. The absolute errors between measured radial wind velocity V i and the fitted resultsV (t) i are calculated by 3. The new value of w i for the next step is determined by 4. The iteration stops until w (t+1) i ≡ w (t) i . In step 3, σ s is the typical Doppler spectral width (in velocity units) considering the turbulence. C 2 = −25dB is another CNR threshold with higher value to make sure all the bad estimates can be screened. The contribution of the outliers with CNR lower than C 2 are set to zero in the next iteration. The iteration continues unless no more outliers are found. Here, we consider the radial velocities with CNR higher than −25 dB as reliable, this will be helpful for the iteration to converge to the right results, especially at high altitude where the reliable velocities from cirrus are often surrounding by noise in a scanning loop. Figure 10 gives 5-days continuous experiment results from December 10th to 14th, 2019. The atmospheric temperature and visibility recorded by a co-located weather transmitter (Vaisala WXT520) and a visibility sensor (Vaisala PWD50) are also given in Fig. 10(d). The detection height can approach 10 km occasionally. The horizontal wind speed in high altitude is much larger than that in low altitude, as measured using the Rayleigh Doppler lidar [25]. On December 13th and 14th, the high-altitude wind speed reaches about 35 m/s. The wind direction is defined relative to the north wind (0°) clockwisely. The vertical wind speed is around zero, sometimes the downward wind about 1 m/s is observed in clouds. The variance of vertical wind speed is an indicator of turbulence [20]. In Fig. 10(c), diurnal variance of vertical speed in the boundary layer associated with the atmospheric temperature change is observed. Due to the relative low temperature in the winter, the convective activity is not as strong as that in the summer [20]. A typical comparison example of the horizontal wind speed and direction between the new algorithm RSWF (introduced in Sect. 4.1) and the usual method DSWF is shown in Fig. 11. The data are chosen from the height of 2 km on December 12th, 2019 without quality control. A significant reduction in the scatter of the inversion results can be seen after using the new algorithm. After data quality control, the inversion probability enhancement of the new retrieval algorithm is shown by drawing a statistic comparison between it and the usual method, as shown in Fig. 12. As the CNR decreases versus the height, the new algorithm RSWF shows obvious improvement relative to the direct sine-wave fitting (DSWF).

Conclusion
In this paper, an all-fiber CDWL with 300 µJ pulse energy was developed for tropospheric wind detection. The instrument suffered a DC leakage induced noise, resulting in biases in the retrieved velocity, CNR, spectral width and other parameters. In our experiment, once the specular reflection is too strong, this phenomenon will occur. It may be the response differences of the two PIN diodes in the balanced detector that produce a biased DC component and saturate the electrical amplifier following it. Then, the amplifier chain recovers from the saturation state with a time-varying response, leading to a decaying DC leakage. In this work, the DC leakage is resolved by using the software method. In the near future, we will confirm the mechanism of the DC leakage. A new hardware update will be used in the design of the balanced detector by adding a band-pass filter between the balanced detector and ADC.
We demonstrated that the accuracy of velocity, CNR, and spectral width are improved with the new algorithm. The false detection probability of precipitation events identified by the spectral width was reduced. A robust sine wave fitting algorithm was also proposed and applied to retrieve wind vectors. The inversion probability of wind increases substantially, especially in the weak signal regime.

Disclosures
The authors declare that there are no conflicts of interest related to this article.