Edinburgh Research Explorer Three-dimensional imaging of stationary and moving targets in turbid underwater environments using a single-photon detector array

: Three-dimensional imaging in underwater environments was investigated using a picosecond resolution silicon single-photon avalanche diode (SPAD) detector array fabricated in complementary metal-oxide semiconductor (CMOS) technology. Each detector in the 192 × 128 SPAD array had an individual time-to-digital converter allowing rapid, simultaneous acquisition of data for the entire array using the time-correlated single-photon counting approach. A picosecond pulsed laser diode source operating at a wavelength of 670 nm was used to illuminate the underwater scenes, emitting an average optical power up to 8 mW. Both stationary and moving targets were imaged under a variety of underwater scattering conditions. The acquisition of depth and intensity videos of moving targets was demonstrated in dark laboratory conditions through scattering water, equivalent to having up to 6.7 attenuation lengths between the transceiver and target. Data were analyzed using a pixel-wise approach, as well as an image processing algorithm based on a median ﬁlter and polynomial approximation.


Introduction
High resolution optical imaging in turbid underwater scenarios remains a major challenge for the photonics community. This is especially true for the cases of three-dimensional optical imaging, and the imaging of moving targets. The main limitations in underwater active optical imaging are caused by absorption and scattering (forward and back), which can result in significant signal attenuation in relatively short propagation distances. Typically, the effect of water absorption on the illuminating beam is reduced by using a wavelength in the blue/green part of the spectrum, however scattering effects tend to dominate as the wavelength decreases. Hence, the optimum wavelength for propagation in water will depend on the level of scattering, and is typically in the region of 450-700nm [1]. There are a number of techniques that can be used to discriminate the scattered light from the target return, which are described in the literature [2,3]. An example is the synchronous scan technique, where a highly collimated laser source scans the target and a receiver with a narrow field of view collects the photon returns. To date, this approach has obtained the best results in discriminating the scattering events, and retrieves target information at more than 6 attenuation lengths (AL) between transceiver and target [4], where an attenuation length is the distance over which the signal intensity is attenuated to 1/e of its original value. Additionally, backscattering can be excluded by using a gated detection system, which allows optical detection only in a short temporal window -this greatly improves the contrast of the image and the achievable range to more than 7 AL. For example, pulsed-gated laser line scanning systems have achieved intensity imaging at 7.6 AL at 7 meters range [5]. Currently, underwater moving target data are acquired in two dimensions using complementary metal oxide semiconductor (CMOS) [6] or Si-based electron-multiplying charge coupled device (EMCCD) [7] based sensors, which are able to detect low light levels with a high pixel resolution (≥ 1024 × 1024). These cameras, however, are not equipped with picosecond timing electronics, and full depth information needs to be obtained with alternative techniques. For example, the CMOS based system described in [6] uses a Q-switched laser emitting a 3.5 W average optical power beam at the wavelength 532 nm and obtained depth information with a range gated sweeping technique, which allows three dimensional imaging at up to 4.5 AL at 8 meters and a depth resolution of a few centimeters, depending on water turbidity.
Over the last few years, time-correlated single-photon counting (TCSPC) has emerged as a detection approach for high performance LiDAR and depth profiling in challenging free-space environments. The TCSPC technique is a statistical sampling method that records the arrival time of a photon with respect to a synchronization signal, with picosecond temporal resolution. Examples of such applications of TCSPC include depth imaging through smoke or fog [8], depth profiling of targets hidden behind camouflage [9], kilometer range depth profiling [10,11], and for moving targets in complex scenes [12]. Recently, TCSPC has been used for imaging through highly scattering underwater environments of up to 9 AL [13,14] between transceiver and target whilst using only a few milliwatts of average optical power. This work used a monostatic transceiver with optical scanning in conjunction with an individual silicon single-photon avalanche diode (SPAD) detector in TCSPC mode. This single-pixel detector configuration resulted in a narrow optical field of view (of 0.5 mrad) which when scanned achieved high spatial resolution imaging (< 60 µrad angular resolution) even through highly scattering environments of up to 8 AL. However, such excellent optical performance came at the cost of long acquisition times per frame, which limited the practicality of this approach for the imaging of moving targets, for example [15]. In general, the disadvantage of this single-pixel scanning approach is that longer measurement times are required in order to scan the target.
Recent advances in silicon CMOS SPAD arrays have allowed time-resolved imaging at frame rates in the order of several kHz and with high spatial resolution (> 70,000 pixels) [16] for several applications, including fluorescence lifetime imaging microscopy (FLIM) [17,18], and free-space three-dimensional imaging [19][20][21][22]. Several SPAD array sensors have been developed with integrated TCSPC electronics in each pixel and parallel readout for acquisition frame rates in the order of kHz [23,24]. However, in order to achieve such fast acquisition frame rates, generally the pixel format is limited up to approximately 64 × 32 pixels [25]. Higher pixel resolutions have been demonstrated, but the readout process typically takes longer, drastically reducing the acquisition frame rate [21,26]. The incorporation of TCSPC electronics on each pixel has previously resulted in a reduction of the detector array fill-factor (i.e., the ratio between the photo-sensitive area of the pixel and the total area of the pixel [27]). In many cases, the fill-factor was lower than 5% [27,28], leading to an overall reduction in array efficiency. Microlenses have been used to improve the fill factor of the pixel, by concentrating the incident photons onto the active area of the devices [27,29,30]. More recently, different architectures have been used to improve the fill factor of a pixel, including analog approaches [31] or 3D-stacked CMOS technology [32,33].
This paper describes an investigation of the potential of SPAD detector arrays, operating at frame rates approaching 1 kHz, for single-photon depth imaging of stationary and moving targets in an underwater environment. The SPAD detector array used in this work had 192 × 128 pixels with TCSPC electronics integrated into each individual pixel [34]. It was fabricated in 40-nm CMOS technology with the pixels having dimensions of 18.4 × 9.2 µm and an active area of 22 µm 2 , resulting in a fill factor of 13%. The single-photon detection efficiency of the SPADs was approximately 7% at the operational wavelength of 670 nm [35], leading to an overall detection probability of 0.9% when the effect of fill-factor is included. The time to digital converters (TDC) integrated in this sensor are the smallest reported to date (9.2 µm × 9.2 µm). The timing bin width of the sensor could be varied from 33 ps to 120 ps, allowing a tunable single shot depth resolution at the sub-centimeter level. Henderson et al. [34] includes a detailed description of the detector array architecture. This SPAD array outputs data in the form of binary frames, where each pixel can contain a "0" denoting the absence of an event, or a "1" denoting the presence of a time-tagged photon event or occasionally a time-tagged dark event. These events are time-tagged with respect to the preceding laser pulse output, to provide a time-of-flight for return photon events. For this work, these binary frames were recorded by the sensor at a rate of between 500 and 990 Hz. In order to acquire data for a single binary frame, the SPAD detectors were active for a pre-selected time duration of 0.01 ms, 0.1 ms, or 1 ms, with the exact value typically being chosen on the basis of the level of signal return. To the best of our knowledge, this work represents the first application of SPAD detector arrays to underwater imaging, allowing single-photon depth and intensity profiling in highly scattering conditions at binary frame acquisition rates in excess of 500 Hz.

Experimental setup
Depth and intensity profiles were obtained with a CMOS SPAD detector array used in conjunction with a bi-static optical transceiver system. A schematic of the laboratory-based experimental setup is shown in Fig. 1 and a summary of the main parameters is listed in Table 1. The illumination was provided by a pulsed semiconductor laser diode (PicoQuant LDH670, PicoQuant GmbH, Germany) with central wavelength λ = 670 nm. This wavelength was selected to maximize transmittance of light in the highly scattering water, as described previously [13]. A mirror on a kinematic flip mount (FM1) was used to switch the laser beam between two optical configurations: water transmittance measurement or underwater depth profile acquisition. The light was directed into a 110 liter capacity glass tank (1750 mm long, 250 mm high, and 250 mm wide), where the target was placed in the water at a distance of approximately 1.7 m from the end of the tank nearest the SPAD array.
Transmittance measurements were performed to estimate the attenuation coefficient of the environment. As the light propagates a distance d in water with attenuation coefficient α, the average optical power P is given by [36]: (1) where P 0 is the initial value of the average optical power, and αd gives the number of attenuation lengths. In these experiments, the propagation medium was unfiltered tap water to which Maalox suspension (a commercially available medicine) was added -the Maalox in the water acted as a scattering agent with negligible effect on the absorption of the illumination [37]. This artificial environment was chosen to simulate natural waters because its volume scattering function is similar to that measured for several natural sea waters [38]. Different concentrations of Maalox in water were used to increase the level of scattering of the environment thus increasing the number of attenuation lengths between the system and the target (without changing the actual stand-off distance of the target). In a real environment, the same number of attenuation lengths would be equivalent to longer stand-off distances and a reduced attenuation coefficient, which will be the case at lower levels of scattering in open ocean waters, for example. The beam from the diode laser source was coupled into a 105 µm diameter core optical fiber in order to obtain a circular beam. The light from the output of the fiber was collimated and directed into the glass tank where a 1 meter long dovetail rail was aligned with the side-wall of the tank. A second mirror at an angle of incidence of 45°to the beam, and on a kinematic flip mount, (FM2) was fixed on the rail and used to direct the beam out through the side wall of the tank, allowing the power reading to be taken at the position P in . By removing the flip mirror FM2, the light was incident on a second mirror placed at a distance d = 0.5 meters away, and the optical power measurement was repeated at the position P out . The transmittance of the propagation medium over half a meter was then obtained as the ratio of P out and P in and can be related to the attenuation coefficient, α, in the following manner: From this estimation of α, the number of attenuation lengths to the target were obtained immediately prior to each depth profiling measurement, assuming the target distance was known.
After the estimation of α, the SPAD detector array based transceiver was used to acquire depth and intensity profiles of targets in water. The beam from the laser source was directed through two engineered diffusers (D1 and D2 in Fig. 1) in order to achieve a relatively uniform illumination over the target area in the field of view of the sensor. The light scattered back from the target was collected using a commercially available 50 mm effective focal length objective lens, which imaged the target onto the sensor. The f-number of the objective lens was set to 1.8 in order to maximize the return signal from the target. The SPAD array sensor operated in TCSPC mode and the synchronization signal was provided by the laser system with a 40 MHz repetition rate, which corresponded to an overall histogram length of 25 ns between adjacent optical pulses. This period corresponds to a range to the target of 2.8 meters in water, when the round-trip distance is taken into account, and ensured only one individual pulse was in transit between system and target at any one time for this 1.7 m target range. For longer target distances, operation at a fixed repetition rate can lead to range aliasing, as multiple pulses in transit will lead to ambiguity in the return distance. To avoid range aliasing at longer target ranges, use of a lower laser repetition rate is required, or use of a combination of different laser repetition frequencies, or use of a known non-periodic pulse train as demonstrated previously [39].
The timing bins were set to 34 ps in these experiments, which is equivalent to a 3.8 mm depth measurement in water. This meant that 736 timing bins were used in each histogram for the fixed laser repetition rate of 40 MHz. A system instrumental response was recorded using a reference Lambertian target with a nominal overall reflectance of 99% (Spectralon Diffuse Reflectance Target, Labsphere) which was set approximately at normal incidence to the axis of the objective lens. The instrumental measurement was performed by acquiring 20,000 binary frames of 1 ms binary frame acquisition time, and recording the photon return for all the pixels. The Full-Width at Half Maximum (FWHM) of the return peak in the histograms was determined by averaging timing histograms from the whole array after excluding the hot pixels, i.e. those pixels with a much higher dark count rate than average [40]. This resulted in an estimation for the instrumental response of approximately 350 ps FWHM.
During a TCSPC measurement, each pixel was capable of detecting only one event per binary frame, therefore each binary frame is equivalent to a bit plane, where 0 represents no events recorded and 1 represents one event recorded. For the duration of the binary frame acquisition time, each SPAD pixel is active until it detects an event or receives the reset signal. A field-programmable gate array (FPGA) board (Opal Kelly XEM6310-LX150) was used to configure the sensor and due to limitations of the firmware there was an inter-frame delay (t ifd ) of approximately 1 ms between two consecutive binary frames. This means that the overall acquisition time, T Acq , for a measurement consisting of N bf binary frames was equivalent to where t bf is the binary frame acquisition time. In the experiments discussed in this paper, binary frame acquisition times of 0.01 ms, 0.1 ms, and 1 ms were used which, as a result of this inter-frame delay, was equivalent to binary frame acquisition rates of 990 Hz, 909 Hz, and 500 Hz, respectively.

Cross-correlation
A number, N bf , of binary frames of the target scene was recorded during each measurement.
If an event was detected by a pixel, its bin position was stored in a timing histogram for that pixel. A timing histogram was then constructed for each individual pixel by summing several binary frames together. The depth and intensity information for each pixel were obtained using a cross-correlation analysis. The cross-correlation, X, was calculated for each pixel between the histogram recorded, H, and a per-pixel instrumental response, I, both with k timing bins, as The highest value of cross-correlation gave an estimation of the time-of-flight of the photon with respect to the reference for each individual pixel, and a depth profile of the target area was reconstructed from depth information for the entire array. At the same time, the intensity information was obtained by collecting the number of events around the maximum of the cross-correlation, in an interval equal to twice the average timing jitter of the camera. The instrumental response was recorded as explained in the previous section. The results were displayed over a depth range of 100 mm to be consistent with the depth of the target objects used throughout this paper. An arbitrary zero position was used to enhance the presentation of the color depth scale, and not the absolute target depth of approximately 1.7 meters. The cross-correlation approach is an effective technique for finding the location of the highest peak in the histogram, and provides a robust estimation of the time-of-flight of the return photons in favorable conditions, such as long acquisition times and/or the absence of scattering events. Of course, as the detected photon level decreases, the depth estimation will have reduced resolution [41]. Additionally, in the presence of high background levels, the signal to noise ratio reduces, consequently increasing the likelihood of an incorrect depth value being estimated. The depth estimation can be improved by performing the cross-correlation approach over a small timing bin interval centered on the expected target return time. In this work, this form of software gating was applied in all the data analysis, using an interval of 200 bins approximately centered on the expected target return. This avoided much of the backscattered light in the water tank, consequently improving the depth and intensity profiles. Such an approach, however, requires a priori knowledge of the target position, which typically is not known in a final application. Advanced image processing techniques can be used to address this problem. For example, Tachella at al. [12] presented a state-of-the art algorithm to reconstruct complex scenarios from single-photon data using the entire histogram. This analysis was demonstrated for long-range depth imaging of complex media in free-space and will be investigated for scattering environments such as underwater scenarios.
Another source of error is due to hot pixels, inclusion of which can result in additional errors in the overall image depth and intensity estimates. Therefore, the profiles obtained with the cross-correlation approach were further processed to remove these pre-determined hot pixels. The hot pixels were identified by performing a measurement of 20,000 binary frames in fully dark conditions in order to identify a threshold for hot pixel removal [42]. For the detector array used in this work, a hot pixel was defined as a pixel having a dark count level which occurs with more than 50% probability in a binary frame acquisition. In these measurements for a 1 ms binary frame acquisition time, this resulted in 6% of the pixels being classified as hot pixels and these were removed for the analysis.
In the case of moving targets, the images processed were arranged in sequence with a custom Matlab routine to obtain a video with a display video rate selected by the user.

Image reconstruction
When imaging through highly scattering environments and/or using very short acquisition times, the signal return is often so low that the cross-correlation approach used in conjunction with software gating is not always capable of discerning the target. In these extremely sparse photon cases, additional image processing techniques are required in order to improve the depth and intensity estimation. In this work, the depth and intensity profiles were obtained with the cross-correlation approach and the hot pixels were removed as described above. Then, a pixel-wise median filter was calculated over a 3 × 3 neighborhood pixel of the depth map in order to fill the empty pixels and smooth the image. A smaller depth range of interest was selected to discard more scattering events and the median filter was applied again to further smooth the depth information. By analyzing the reconstructed depth profile, pixels within the depth range of interest were used to select the corresponding pixels in the intensity map on which to perform the same reconstruction. In photon-starved conditions, the reconstructed depth and intensity profiles were further smoothed using a polynomial approximation (PA): each row, R i , of the median filtered matrix was approximated, by using a least squares method, as where i is the number of rows (i ∈ [1,192] in this case), x represents the pixels over the horizontal direction, and a j are the coefficients of the polynomial of degree n that best fit the data of the i-th row of the median filtered image. The degree of the polynomial was set to 10, which was manually chosen by the user as a compromise to reject as many scattering events as possible without over-smoothing the target. In the case n > 10, the model retained most of the scattering events without improving the image. In the case n < 10, the model was not able to identify the target, which eventually disappeared. The same approximation was performed for the columns of the image matrix (i ∈ [1, 128]), and the final image was obtained as an average of the row and column approximations.
It is important to note that this reconstruction technique works on the depth and intensity values obtained with the cross-correlation approach without taking into account the Poisson distribution of the data. Advanced image processing techniques have been developed over the last few years to reconstruct single-photon depth and intensity images in challenging environments [8,11,[43][44][45][46]. Such bespoke reconstruction algorithms will be the subject of our future work on underwater single-photon imaging.

Results
Several depth and intensity profile measurements were performed through water containing different scattering concentrations. The target used for these experiments is shown in the photograph in Fig. 2(a) and was a 13 mm thick, circular stainless steel flange with an external diameter of 70 mm. It had a central clear hole of 40 mm diameter, and a 48 mm diameter, 8 mm deep counterbore. Six, 7 mm diameter, holes were equally spaced around the flange (i.e. at 60 degree intervals) on a 58 mm diameter. The target shown in Fig. 2(b) was a Siemens star with an outer diameter of 40 mm, a clear center circle diameter of 5 mm, and 18 equal bar and space pairs (20°per cycle, 10°per spoke) radiating out from the center. This was laser-cut in a 3 mm thick sheet of white acrylic and was used to study the degradation of the spatial resolution in different scattering levels. Both targets were spray painted with an opaque primer in order to avoid causing specular reflections.

Stationary targets
The first experiment examined the depth imaging of stationary targets. A section of the flange was imaged in different scattering environments by acquiring 1000 binary frames with a binary frame acquisition time of 1 ms. The attenuation of the target return and the effect of backward and forward scattering is shown in Fig. 3, where the histogram of a single pixel containing target information is shown in the case of (a) unfiltered tap water (1.2 AL) and (b) highly scattering water (7.2 AL). The average power used in Fig. 3(a) and Fig. 3(b) was 0.4 mW and 8 mW, respectively, to account for the very different attenuation levels. In the absence of scattering the histogram shows a clear single peak return from the target. In contrast, once the scattering agent is added to the water, the target return is highly attenuated and appears to have a wider temporal profile, likely due to the forward scattered events. Additionally, the background due to backscattered light is dominant in the histogram, highlighting the importance of discarding these detection events in the cross-correlation process in order to retrieve the correct depth value. Note that an exponentially decreasing backscatter trend is not evident, likely due to the objective lens being focused on the more distant target.
A few examples of depth and intensity profiles of the flange target acquired in different environments are shown in Fig. 4. The top row of Fig. 4 shows the results obtained in unfiltered tap water, equivalent to 1.2 AL between system and target at λ = 670 nm. The total average optical power entering the tank was approximately 0.4 mW, which meant that the average optical power per pixel at the target was approximately 1 nW. This average optical power level was chosen as it allowed a sufficient return from the target without saturating the detector array. The central and bottom line of Fig. 4 show the results when the number of attenuation lengths between system and target was equivalent to 4.4 AL and 5.7 AL, respectively. For these measurements, the average optical power entering the tank was increased to 8 mW, the highest available power that would not compromise the pulse duration of the laser. The depth and intensity profiles obtained with the pixel-wise cross-correlation approach for the three environments are shown in columns Fig. 4(a) and Fig. 4(c), respectively. These results clearly show the degradation of the image resolution and contrast as the level of scattering increases. The profiles obtained in unfiltered tap water exhibit high spatial and depth resolution, allowing the reconstruction of the target using the cross-correlation approach. However, adding the scattering agent to water meant that increasingly more background photons and fewer return photons are collected leading to the greater likelihood of wrong depth evaluation and increased noise in the reconstructed images. In order to improve the profiles of the target, the depth and intensity information obtained with the cross-correlation approach were reconstructed using the median filter model, as shown in Fig. 4(b) and Fig. 4(d), respectively. The filtered profiles show that for a low level of scattering the millimeter features of the target are reconstructed well. However, in the presence of high background levels the model retains several scattering events in the reconstruction, resulting in noise being clearly exhibited in the reconstructed target profile.
By further increasing the level of scattering above that shown in Fig. 4, the cross-correlation approach fails to accurately determine the depth information, making it difficult to recognize the shape of the target. Two examples are shown in Fig. 5, where the depth and intensity profile of a section of the flange were obtained with 6.2 AL (top row) and 7.2 AL (bottom row) between system and target. The first column shows the depth and intensity information obtained with the pixel-wise cross-correlation approach. In both of these environments, the recognition of the target is challenging due to the high level of scattering events, which leads to incorrect depth estimation in a large proportion of the pixels. Processing the images with the median filter model removed most of the incorrect depth values, allowing the target to be clearly distinguished from the scattering events. However, noise due to scattering in the proximity of the target is still retained in the reconstruction and it required additional processing. Therefore, the PA was then performed on the median filter depth and intensity profiles, allowing much improved recognition of the flange target at 6.2 AL and still detecting the target even at 7.2 AL. Table 2 summarizes the results obtained in the scattering conditions discussed in Fig. 4 and Fig. 5, showing the average per pixel of the photon return in the peak, the highest bin in the peak, and the background counts per bin. The calculation was performed over approximately 3000 pixels that contained target information. Defining the Signal to Background Ratio (SBR) as the ratio between the highest bin in the peak and the average background per bin, the table shows also that the average SBR decreases exponentially with the attenuation length between system and target, as expected. Fig. 4 and Fig. 5. The images were obtained by recording 1000 binary frames of 1 ms binary frame acquisition time. The average optical power was reduced in clear water conditions in order to avoid detector saturation effects. A quantitative comparison of the reconstructed images was performed by evaluating the reconstruction signal to noise ratio (RSNR), which compares the reconstructed images to a ground truth image, for example as defined in [43]:

Table 2. Summary of figures of merit of the results shown in
where x is the reference depth or intensity image,x is the reconstructed image, and ||x|| 2 is the 2 norm given by x T x. The RSNR was calculated for depth and intensity images obtained with the median filter model in each environment. Due to the high photon return and low background level, the filtered depth and intensity profile at 1.2 AL was used as the ground truth to compare the reconstructed images in scattering media. The graph in Fig. 6 shows the normalized RSNR calculated for the reconstructed depth images (black curve) and the reconstructed intensity images (red curve) at different attenuation lengths. All the measurements were performed using 1000 binary frames with 1 ms binary frame acquisition time. The average optical power entering the tank was 8 mW for the images obtained in the scattering environments, but was reduced to 0.4 mW for the ground truth in clear water due to the potential of detector saturation. As expected, the RSNR of the intensity decreases significantly after 4.5 AL, due to the high level of attenuation of light in water. However, the RSNR of the depth remains constant up to approximately 5.4 AL, and then progressively decreases up to 7.6 AL, the most attenuated data point measured. At values greater than 7.6 AL the median filter model failed to reconstruct the profile of the target because of lack of depth information from the target. A potential major disadvantage of the detector array approach compared to the previous scanning single-pixel optical system [13] is that the increased optical field of view will necessarily increase the level of scattering events detected, leading to a reduction in both image contrast and spatial resolution. The spatial resolution of the system was investigated by studying the intensity profiles of the Siemens star target (shown in Fig. 2(b)) at different attenuation lengths. As discussed previously, at high levels of attenuation it was difficult to obtain the intensity profile of the target with the pixel-wise cross-correlation approach due to the high proportion of scattering events detected. The low photon return from the target and the high background reduced the contrast of the image to the extent that it was not visible at over 4.5 AL. Therefore, the spatial resolution of the system was studied at 1.2 AL, 3.6 AL, and 4.3 AL. The corresponding intensity profiles of the target are shown in Fig. 7. The measurements were performed by acquiring 1000 binary frames, using a binary frame acquisition time of 1 ms. The average optical power leaving the system was 0.4 mW in the case of 1.2 AL, and then increased to 8 mW for higher levels of scattering. The intensity profiles in Fig. 7 show the effect of forward scattering on the spatial resolution and the effect of backscattering on the contrast of the images, making the target significantly less visible at 4.3 AL. In order to quantify this degradation, the achievable spatial resolution was estimated as the smallest bar width that the system is able to resolve, which can be determined by visual inspection of the plot of the number of counts in each pixel row. Approaching the center of the star, the peaks corresponding to the white bars of the Siemens star will merge making resolution of the bar width increasingly difficult. Using this method, an angular resolution of 0.46 mrad was estimated in clear water. Increasing the level of scattering, the angular resolution degraded to 0.77 mrad at 3.6 AL, and 1.23 mrad at 4.3 AL. As expected, the large optical field of view of the receiver system allowed forward scattered light to be misdirected into adjacent detector pixels whilst maintaining approximately the same time-of-flight, consequently degrading the spatial resolution of the system.
In order to study how the contrast of the image varies with the spatial frequency under the different scattering conditions, the contrast was evaluated for the three configurations. A geometrical progression of circumferences centered on the Siemens star was used to obtain the intensity variations corresponding to different frequencies. Due to the array architecture and the shape of the individual detector pixels having a 2:1 rectangular aspect ratio, the vertical direction had 192 pixels on a 9.2 µm pitch, whereas the horizontal direction had 128 pixels on an 18.4 µm pitch. Hence, we used the vertical dimension to analyze the contrast of the image. For each frequency, the contrast was calculated as: where I Max is the average of the peaks in the intensity array, and I Min is the average of the minima. The graph in Fig. 8(a) shows the contrast versus spatial resolution obtained at 1.2, 3.6 and 4.3 attenuation lengths. In this analysis, the detector dark counts have not been removed, however the dominant source of background is scattered photons in this case, hence detector dark noise correction had a negligible effect. As the level of scattering increased, the backscattered events are reflected back to the camera before reaching the target, reducing the contrast of the images. Additionally, at 1.2 AL and 3.6 AL the contrast decreases as the number of line pairs per millimeter increases, as expected. However, this behavior is less evident at 4.3 AL. This is likely due to the high level of scattering retained in the intensity image, which makes it difficult to accurately evaluate the contrast. The same analysis was performed on the image obtained with the median filter based model, and the results are shown in Fig. 8(b). The graph shows that when additional image processing is used, the contrast is generally lower, and decreases more rapidly with increasing spatial resolution than the non-filtered image, as the spatial correlations used for improved target reconstruction inevitably degrade spatial resolution. This behavior is observed in all three scattering environments analyzed.

Moving targets
The second experiment investigated the potential of the system to obtain depth and intensity information from moving targets immersed in different underwater environments. An example is shown in Fig. 9 where three extracted aggregated frames of the moving target flange are displayed in sequence, showing the depth (top row) and the intensity (bottom row) profiles. The images corresponding to a single aggregated frame of the final video were obtained by adding together 50 binary frames, and performing the pixel-wise cross-correlation approach to obtain depth and intensity profiles. The data were recorded while the flange was spinning at approximately 0.5 revolutions per second about a vertical axis in clear water, which was equivalent to 1.2 AL between system and target. The binary frame acquisition time was 1 ms and the average optical power entering the water tank was approximately 1 mW. The full video of the moving target is available as Visualization 1 in the supplementary materials. In the video, the depth of the target with respect to the reference is displayed in a 3D graph and the color map provides the intensity information. The moving target was analyzed under varying binary frame acquisition times using the pixel-wise cross-correlation approach, as summarized in Fig. 10. The average optical power entering the water tank was approximately 1 mW and three sets of measurements were performed with a binary frame acquisition time of 1 ms, 0.1 ms, and 0.01 ms. The equivalent binary frame acquisition rates were 500 Hz, 909 Hz, and 990 Hz, respectively. The first and second column of Fig. 10 show the depth profile of the flange in a single aggregated frame of the video obtained by combining 50 and 20 binary frames, respectively. The results demonstrate that the target is fully discernible in each depth profile, also at the shortest acquisition time. Visualization 2 in the supplementary material shows the video of the target obtained combining 50 binary frames, with each binary frame having an acquisition time of 0.01 ms.
The third column of Fig. 10 shows the depth profiles of the flange target obtained by using the cross-correlation approach on data from only a single binary frame, which meant an average of much less than one photon event per histogram. In this case, the average photon events per histogram (i.e. per pixel) was 0.56 for a 1 ms binary frame acquisition time, 0.15 for a 0.1 ms binary frame acquisition time, and 0.018 for a 0.01 ms binary frame acquisition time. As the acquisition time decreases, the shape of the target becomes less evident and it cannot be resolved at the shortest acquisition time. Visualization 3 shows the video of the flange obtained using a single binary frame of 1 ms acquisition time. In this video, only the depth is displayed because the intensity can only be equal to 1 or 0.
With increased levels of scattering, the return signal from the target was so highly attenuated that it was difficult to perform the analysis with a reduced number of binary frames or at short acquisition times. Therefore, to investigate the highly scattering scenarios, the measurements were performed using 1 ms binary frame acquisition time and the cross-correlation approach was performed on histograms formed by combining 50 binary frames. The results are shown in Fig. 11 for 3.6 AL, 4.8 AL, 5.8 AL, and 6.7 AL. The average optical power entering the water tank was approximately 1 mW when the attenuation of water was 3.6 AL between the transceiver and target, then it was increased to 8 mW for the higher levels of scattering. The first column shows the depth and intensity profiles obtained with the pixel-wise cross-correlation approach, illustrating the deleterious effects of scattering. The depth estimation becomes less accurate at higher attenuation levels and the intensity of the target gradually degrades. However, the depth information enabled the target to be reconstructed using the median filter model, and the results were further smoothed with the polynomial approximation, as shown in the second column of Fig. 11. Visualization 4 shows an example of reconstruction of the target at 4.8 AL, and Visualization 5 shows the reconstruction at 6.7 AL. In these highly scattering scenarios, the moving images of Visualization 4 and Visualization 5 are shown at 10 aggregated frames per second. While the target can be discerned in Visualization 4, several scattering events are retained in the reconstruction in Visualization 5, making it difficult to recognize the shape of the target. However, the video illustrates it is still possible to detect the target despite the high level of scattering in water. Fig. 11. Depth and intensity profiles of a section of the flange target at 3.6 AL, 4.8 AL, 5.8 AL, and 6.7 AL at a range of approximately 1.7 m. The first column shows on the same graph the depth and the intensity profiles obtained with the pixel-wise cross-correlation approach. The second column shows the depth and intensity information reconstructed with the median filter and smoothed with a polynomial approximation. The measurements were performed by acquiring a number of binary frames of 1 ms binary frame acquisition time. The average optical power entering the tank was 1 mW in the case of 3.6 AL, and 8 mW at higher levels of scattering.

Discussion and conclusions
This work presents the first depth and intensity profiles obtained in highly scattering underwater environments using a SPAD detector array. Reconstructed target profiles were obtained at binary frame acquisition rates approaching 1 kHz on stationary and moving targets. The detector used was a 192 × 128 format SPAD detector array with TCSPC electronics integrated in each pixel, giving the advantage of full-field time-resolved data acquisition. Several measurements were performed under dark laboratory conditions at a target distance of approximately 1.7 m, with increasing levels of scattering. The results demonstrate depth and intensity profiles of stationary targets at attenuation levels up to 7.2 AL between system and target, and profiles of moving targets up to 6.7 AL. The binary frame acquisition rate was varied from 500 Hz up to 990 Hz, providing depth and intensity information of moving targets. A limitation in the control firmware for the sensor meant that there was a delay between binary frames, which reduced the actual binary frame acquisition rate of the camera. Improvements to the firmware for the camera would enable faster binary frame acquisition rate, which allows a higher number of binary frames available for the depth estimation or for tracking rapidly moving targets.
The spatial resolution of the system was estimated in different configurations, demonstrating an angular resolution of 0.46 mrad in unfiltered tap water. However, as the level of scattering increases, the angular resolution decreased to 1.23 mrad when the attenuation was equivalent to 4.3 AL between system and target. This was due to the relatively large field of view of the system necessary for the detector array when compared to the single-pixel scanning system used in previous work. The reduced spatial resolution at increased levels of underwater scattering occurs as significantly more scattering events are necessarily detected in this optical configuration. Inevitably, the combination of SPAD detector array and a larger field of view enables rapid imaging of the target area but at the cost of reduced spatial resolution. Additionally, these results show the expected reduction in contrast when the scattering agent was added to water due to the increase in back-scatter. There is partly analogous to the modulation transfer function analysis of aerosol particles in free-space [47], and this will be subject to a more in-depth analysis in future work. Backscattering can be reduced by using time-gating on the detector array, allowing the detection of photon events in a temporal window corresponding only to the expected return from the target. Future work will investigate the effect of nanosecond detector gating on the backscattering events when imaging underwater with a SPAD detector array imager.
Compared with other approaches, the SPAD array technique described in this paper has exhibited full-frame data acquisition up to 990 Hz with single-photon sensitivity. This approach has demonstrated depth profiling at 6.7 AL at a target range of 1.7 meters whilst using only 8 mW average optical power. Other approaches such as pulsed-gated laser line scanning have achieved intensity imaging at 7.6 AL at 7 meters range [5], however much higher optical powers (> 1 W) were required and the overall frame rate was lower than that presented in this paper. Si-based EMCCD cameras have also been used in active imaging configurations to provide two dimensional imaging in clear water at 10 meters [7], giving the advantage of a high spatial resolution of 1024 × 1024 pixels, although without timing (i.e. depth) information. Fast-shutter CMOS cameras have been used to demonstrate high pixel resolution (1280 × 1024) range-gated imaging at up to 4.5AL at 8 meters with a laser source emitting 3.5 W average optical power [6]. Whilst we have described early demonstrations of the SPAD detector array approach, there is obvious potential for 3D profiling of moving targets at video frame rates in highly attenuated underwater scenarios. Longer range moving targets will be investigated by incorporating higher power laser sources, used in conjunction with detector time-gating to avoid the consequent increased backscatter from the vicinity of the sensor. However, additional modifications to the optical setup of the receiver must be considered for outdoor scenarios. The experiments described in this work were performed in a controlled laboratory environment in dark conditions. In a natural environment, daylight background would affect the measurements by introducing additional uncorrelated events in the histogram, reducing the signal-to-noise ratio. For outdoor usage, the optical setup of the receiver needs to be modified in order to include optical filters to further reduce the background, as previously demonstrated in free-space single-photon depth profiling measurements [8][9][10]15]. Additionally, the depth of field of the optical setup used in this work was limited to only a few centimeters, hence the optimum focus position was adjusted before each measurement. Therefore, further modifications to the optical setup of the transceiver should be made to remotely control the focus adjustments, and/or to obtain a longer depth of field.
The integration of TCSPC electronics in each detector pixel gives the advantage of parallel acquisition of time resolved data, but this comes at the cost of a reduced fill-factor, which was 13% in this case. This further reduces the overall detection efficiency of a single pixel, making the detection of light more difficult in photon-starved conditions. It was demonstrated that the fill-factor of this SPAD array can be increased to 42% by the use of microlenses and a stacked configuration [48], greatly improving the overall detection efficiency of the array. If used in conjunction with detector gating and restricted optical field of view, an enhanced fill-factor could improve the performance of system in scattering environments, and will be subject of future investigations. Alternatively, SPAD detector arrays with enhanced detection efficiency at the wavelengths used in this work could be considered for later work [49].
Future work will also include the examination of bespoke single-photon image processing algorithms using spatial correlations to extend the operating range of the system to higher scattering levels. This will include the integration of real time processing capabilities, which will allow this system to be used in a number of expanding application areas, such as optical subsea surveys or automotive lidar [50].