Time-resolved spectral-domain optical coherence tomography with CMOS SPAD sensors

: We present a ﬁrst spectral-domain optical coherence tomography (SD-OCT) system deploying a complementary metal-oxide-semiconductor (CMOS) single-photon avalanche diode (SPAD) based, time-resolved line sensor. The sensor with 1024 pixels achieves a sensitivity of 87 dB at an A-scan rate of 1 kHz using a supercontinuum laser source with a repetition rate of 20 MHz, 38 nm bandwidth, and 2 mW power at 850 nm centre wavelength. In the time-resolved mode of the sensor, the system combines low-coherence interferometry (LCI) and massively parallel time-resolved single-photon counting to control the detection of interference spectra on the single-photon level based on the time-of-arrival of photons. For proof of concept demonstration of the combined detection scheme we show the acquisition of time-resolved interference spectra and the reconstruction of OCT images from selected time bins. Then, we exemplify the temporal discrimination feature with 50 ps time resolution and 100 ps jitter by removing unwanted reﬂections from along the optical path at a 30 mm distance from the sample. The current limitations of the proposed technique in terms of sensor parameters are analysed and potential improvements are identiﬁed for advanced photonic applications.


Introduction
Low-coherence interferometry (LCI) is an indispensable tool for non-contact optical inspection of layered structures with micrometer depth resolution. To reveal the depth profile of the sample in question, the intensity of light backscattered from structures along the depth is evaluated. While in theory this could be achieved by measuring the intensity and echo time of light directly (e.g. by time-of-flight (TOF) measurement of photons), micrometer depth resolution would require femtosecond timing accuracy. Instead, LCI measures interference between light backscattered from the sample and light back-reflected from a surface at a reference depth. LCI has been used in biomedicine by optical coherence tomography (OCT) in particular, providing sub-surface, cross-sectional images of biological tissues. Since its inception in 1991 [1], OCT has become a powerful technique used across a wide range of medical fields including ophthalmology, dermatology, cardiology, and in vivo in situ optical biopsies of internal organ systems [2].
Early OCT systems applied a single detector element and measured the interference signal in time as the reference arm of a Michelson interferometer was scanned along the depth (time-domain OCT (TD-OCT)) [1]. In Fourier-domain OCT (FD-OCT) [3] the reference arm length is fixed and the backscattered light is collected from all depth levels of the sample at the same time. The sample's depth profile is revealed from the spectrum of the interferogram, relying on the Fourier-transform relation of a signal's autocorrelation function and its power spectral density. The spectral interferogram can be acquired either using a broadband light source and dispersing the interference signal on a line sensor (spectral-domain OCT (SD-OCT), Fig. 1), or with a single detector along with a narrow bandwidth light source swept across the spectrum (swept-source OCT (SS-OCT)). A clear advantage of FD-OCT systems over TD-OCT is a higher acquisition speed, as no mechanical scanning of the reference arm is required, enabling video-rate volumetric imaging and reduced motion artefacts. Moreover, FD-OCT offers a higher sensitivity [4]. The sensitivity depends on the OCT signal strength, which is related to the optical power from the source, the imaging speed (which is inversely proportional to the exposure time), and the sensitivity of the detector at the centre wavelength, with necessary trade-offs between them [3]. In an SD-OCT system, a typical sensitivity of around 100 dB can be achieved with milliwatt optical power in the near-infrared (NIR) regime and a few tens of kilohertz A-scan rate [5]. Higher A-scan rates (up to hundreds of kilohertz) are achieved at increased source power [6] or fewer spectral sampling points and hence shorter accessible depth range and/or lower depth resolution [7]. Over a million A-scans per second data rate is accomplished while maintaining high sensitivity of 113 dB by parallel measurement of multiple A-scans belonging to different transverse positions using a 2D sensor array and a high-power source [8]. However, the target application often restricts the system parameters such as the maximum illumination intensity allowed on biological samples or the minimum scan rate to avoid motion artefacts [9]. For high sensitivity, it is equally important to maintain a low noise of the system, which is attributed to the detector, the light source, and shot noise [4]. Shot noise limited detection can be achieved by tuning the reference arm reflection and using balanced detection schemes [10,11]. There has been a growing interest in low noise detectors as well, especially in applications where photon budgets are low. Shot noise limited sensitivity was achieved in an unbalanced SS-OCT system by applying an electron-injection detector [12]. Single-photon counting (SPC) was introduced to OCT by deploying superconducting single-photon detectors (SSPDs) [13], allowing imaging with extremely low light levels [14]. The sensors of these systems had a single channel where the spectral content was separated in time, either by modulating the source wavelength [12] or inducing a wavelength-dependent time delay with a fibre spool [14].
While maintaining a high sensitivity is crucial for imaging weakly scattering features of a sample, it is undesirable for samples producing strong optical reflections. In general, SD-OCT is limited by the finite dynamic range of the detector which can reduce the visibility of weak reflections in the presence of strong scattering interfaces along the depth. The finite dynamic range is even more of a problem when strong back-reflected/backscattered signals saturate the detector [15]. While high-speed cameras with high full well capacity (FWC) could remedy saturation problems, short integration times negatively affect the amount of signal collected from the weakly reflecting sites which may be of more interest. Different solutions have been proposed to overcome this issue based on post-processing [16], automatic control of the reference signal intensity [17], or using a dual-line camera [18]. Muller and Fraser [19] applied time-gating to restrict the detection of backscattered signals to a selected depth region using a nonlinear crystal. Currently, time-resolved single-photon avalanche diode (SPAD) sensors achieve millimetre depth resolution in TOF measurements [20,21]. Therefore, time-gated SPAD sensors can be used to remove the unwanted reflections outside the depth region of interest and overcome saturation problems thus effectively increasing the dynamic range.
Complementary metal-oxide-semiconductor (CMOS) SPAD arrays achieve single-photon detection over several hundreds of parallel channels [22]. Moreover, CMOS SPAD technology allows the integration of sub-nanosecond timing circuits enabling either time-correlated singlephoton counting (TCSPC) or time-gating to be deployed on a per-pixel basis. The advantages of spectrally and temporally resolved SPC with CMOS SPAD line sensors have already been demonstrated in numerous applications including time-resolved fluorescence spectroscopy [23,24], spectral fluorescence lifetime imaging microscopy (FLIM) [25], time-resolved Raman spectroscopy [26,27], and laser-induced breakdown spectroscopy [28]. Establishing time correlation to laser pulses or other events for each detected photon has been exploited to study scattered events as well [29][30][31][32][33][34]. Therefore, there is untapped potential in combining LCI with arrays of time-resolved single-photon counters. In this paper, we present for the first time an SD-OCT system using a grating-based spectrometer and a time-resolved CMOS SPAD line sensor.
In SD-OCT there is a trade-off between the imaging depth and the depth resolution governed by the number of spectral sampling points (i.e. sensor pixels), with a typical depth range of a few millimetres [35]. Imaging at a depth outside this region requires changing the reference arm length by moving the reference mirror. However, multiple shifted positions of the reference surface can be constructed using a multi-reflection arrangement and time-resolved detection. Our overall approach is illustrated in Fig. 2. The presence of multiple reflective surfaces in the reference arm along depths d1, d2, etc., allows spectral interferometry to be obtained for each separate depth by deploying time-gating or TCSPC, assuming a sample surface shape that overlaps several of those depth regions or a transparent sample. The combined detection scheme has the potential to make significant contributions towards advanced interferometric applications like OCT, but also applications where the surface shape is of more interest. As a concept, the depth resolution of 3D ranging could be increased with a multiple reflections reference path. Alternatively, interference could be established between optical signals arriving back from neighbouring spatial positions of the 3D scene, similarly to a computational evaluation of the correlation between the spatial positions of the scene in first-photon imaging [36]. Improvements in OCT may involve overcoming the depth range (versus axial resolution) limitations exposed by the finite number of detector pixels in SD-OCT or increasing image contrast by gating off strong reflections based on their time of arrival [19].
As a demonstration of the proposed platform, first, we show how OCT images can be acquired using backscattered signals extracted from spectrally and temporally resolved photon count histograms, using a single reference surface. Next, we focus on a practical application to remove unwanted reflections originating from the sample and depths along the sample arm. As proof of concept, we demonstrate the removal of unwanted reflections from along the optical path using the temporal masking feature of our SPAD line sensor.

Time-resolved CMOS SPAD line sensor
The sensor comprises 1024 × 8 SPADs, each having a diameter of 8.88 µm. The photon detection efficiency (PDE) at 850 nm wavelength is around 1.4 % [37] and the median dark count rate (DCR) of the SPADs is 2230 cps (counts per second) at 1.2 V excess bias voltage [38]. Full characterisation of the sensor was performed by Erdogan et al. [37].
For time-resolved detection, pixels are formed from two columns of eight SPADs with an area of 23.78 µm × 95.12 µm and a fill factor (FF) of 49.31 %. Each of the 512 pixels contains a histogram block of 32 time bins with a configurable bin width of 51.2 ps to 6.550 ns. The histogram blocks allow to record up to 1 photon per laser pulse overcoming the slowness of traditional TCSPC [37]. In the temporal masking experiments, each column of 8 SPADs was used to form 1024 pixels. The counting of photons can be temporally masked after detection using each pixel's dedicated masking signal, which disables the photon counters (as opposed to gating, where the SPADs are disabled by lowering their bias voltage). The position and width of the temporal mask are set by specifying delays for its edges to the sync signal of the pulsed light source, with a mean temporal resolution of 62.81 ps [37]. Crude timing alignment was performed by delaying the laser sync pulse with a delay box of 500 ps resolution (DB64, Stanford Research Systems, USA) in both cases of fully time-resolved and time-masked acquisitions.
The exposure time was set to 1 ms per A-scan in both experiments. This value was chosen experimentally to ensure sufficient sensitivity while preventing deterioration of the A-scans caused by motion artefacts. Data from the sensor are transferred to a custom firmware on a Spartan 6 field-programmable gate array (FPGA) (Xilinx Inc., USA, encompassed on an XEM6310 FPGA integration module from Opal Kelly, USA) via a 64-bit parallel bus, and to a PC from the FPGA via a USB3 link. The acquired data were processed on the host computer using custom software written in Matlab (MathWorks Inc, USA, Release 2018a).

Optical setup
The optical setup is depicted in Fig. 3. The light source was a WhiteLaseMicro supercontinuum laser (NKT Photonics-Fianium, UK). The laser light was filtered (with filters FESH0900, FEL0700, FB850-40 from Thorlabs Ltd, UK) to collect the desired spectrum of~38 nm full width at half maximum (FWHM) bandwidth centred at 850 nm, measured by a commercial spectrometer (Flame, Ocean Optics). This gives a theoretical axial resolution of the OCT system of 8.4 µm in the air, assuming a Gaussian beam profile [39]. The filtered output of the laser was coupled (using a 10X Nikon Achromatic Finite Conjugate Objective) into a 50:50 single-mode fibre coupler (FC850-40-50-APC, Thorlabs Ltd, UK) of 5 µm core, providing 950 µW of optical power in each arm of the fibre coupler (measured after the collimating lenses using a commercial power meter (PM100D with S130C, Thorlabs). To avoid saturation of the detector, an additional neutral density (ND) filter (NE02A-B, Thorlabs) was placed in the reference arm. The light exiting the sample arm fibre was collimated with a fixed focus collimating package (F280APC-850, Thorlabs Ltd, UK), and focused on the sample with an objective lens of 50 mm focal length (AC254-050-B-ML, Thorlabs). The lateral resolution was estimated (as the diffraction-limited spot size at 850 nm [40]) to be 13.6 µm. Transversal scanning of the laser beam across the sample was carried out using a galvanometer scanner (6210h, Cambridge Technology, USA). The firmware and digital-to-analogue converters on the sensor's printed circuit board were controlling the scanner and synchronized readout of the sensor. The number of steps and the step size of scanning can be set through the firmware. In the time-masked experiments, we used 300 scan positions and a step size of 13 µm. Identical collimating and focusing lenses were used in the reference arm to minimise dispersion mismatch.
A coverslip was placed between the sample and the focusing lens of the sample arm to mimic strong, unwanted reflections, similar to those we would want to remove using temporal masking (e.g. saturating reflections from the corneal apex [15]). Interfering reflections from the two sides of the coverslip produces an autocorrelation signal in the A-scans. The signal appears as a horizontal line in the OCT images (see Fig. 5) at a depth that is governed by the thickness of the coverslip, just as in common-path OCT [41]. In this way, the coverslip reflection and the depth profile of the sample (fingertip) are superimposed in the OCT images, independently of the actual distance between them.
A custom spectrometer [24,41] was built to measure the spectrum of the interference signal, consisting of an achromatic doublet (AC254-050-B-ML, Thorlabs) for collimation, a volume phase holographic transmission grating (1200 lines/mm, 840 nm centre wavelength, Wasatch Photonics, USA) for dispersion and an achromatic doublet (AC254-150-B-ML, Thorlabs) for focusing of light. The wavelengths covered by sensor pixels were confirmed by connecting the supercontinuum output to the spectrometer through a tunable filter (LLTF Contrast, NKT Photonics, UK). The depth range provided by the above configuration was 3.23 mm (when using 1024 pixels). The efficiency of the spectrometer was measured to be 85 %. For this, the bandwidth-limited laser output was directly connected to the spectrometer and the optical power was measured at its entry point and in the focal point using the power meter.
For theoretical sensitivity analysis, the signal-to-noise ratio (SNR) was determined first, Eq. (1), as the signal power ( 2 ) divided by the total noise variance ( 2 ) introduced by the detector ( 2 ), shot noise ( 2 ℎ ) and relative intensity noise (RIN) of the source ( 2 ). Equations (2)-(5) explain these terms in the space (depth) domain for a single reflective surface, similarly as described by Leitgeb et al. [4] and de Boer et al. [3]. The number of incident photons and dark counts at a pixel are both treated as Poisson random variables, hence the noise variances of shot noise and detector noise are taken as the mean values of the incident and dark counts, respectively. These are then converted to the space domain and taken at a single depth location (of the hypothetical reflective surface). The signal power and the noise terms are expressed in squared photon counts. The following notation is used: is the spectrometer efficiency (0.85), is the PDE at 850 nm (0.014 [37]), and are the optical power in the detection arm from the reference arm and sample arm, respectively, is the integration time (1 ms), is the number of detector pixels (1024), = ℎ / 0 is the energy of a single photon at centre wavelength 0 (850 nm), ℎ the Planck constant and the speed of light in vacuum, is the median of the measured DCR of each pixel (2230 cps [38]) and ℎ = 2 ln 2/ 2 0 /( ) is the coherence time with being the spectral width covered by a single pixel (0.0561 nm).
Assuming perfect reflection from the reference arm mirror and neglecting coupling losses, the power from the reference arm can be written as = 0 2 /2, where 0 is the optical power entering the arm (950 µW) and is the attenuation of its ND filter (0.0501). The attenuation produced by the ND filter was assessed separately by directly placing it along the path of the bandwidth-limited laser beam and measuring the optical power before and after the filter. Similarly, we write the power from the sample arm as = 0 /2 with being the sample arm reflectivity. From here, the theoretical sensitivity was expressed as the reciprocal of the sample arm reflectivity (1/ ) which produces the smallest detectable signal, i.e. when the SNR is equal to one.
The sensitivity of the OCT system was evaluated experimentally as well by placing a mirror and an ND filter (NE40A-B, Thorlabs) in the sample arm. A single A-scan was acquired with a 1 ms integration time, showing the peak belonging to the reflection from the sample arm mirror. The sensitivity was measured as the height of this peak (compared to the second highest peak of the A-scan, belonging to noise), plus the attenuation caused by the sample arm ND filter (measured separately with the power meter). The instrument response function (IRF) of the system was measured to be 249 ps as a mean value over the pixels [38]. During the experiments, a thin layer of glycerol was applied to the sample (pulp of finger).

Data processing
To remove fixed pattern noise and the DC component of the A-scans, a background line was subtracted from each acquired interference spectrum. In the case of fully time-resolved measurements, the background line was acquired from an average of 300 histograms. For this, the reference arm was enabled only and photon counts were taken from the same time bins like the ones where the interference spectra are located when using both arms. Similarly, the average of 300 time-masked spectra, with only the reference arm enabled, was calculated for the background line of time-masked measurements.
Each background-corrected line was resampled using spline interpolation before Fourier transformation to rely on data sampled in k (frequency) space instead of wavelengths. The resampled data were further filtered with a Hann window to prevent spectral leakage. After this, fast Fourier transform (FFT) was performed on the spectra (with no zero padding) to get the desired depth profiles. Matlab scripts implementing the processing routines are available along with raw data from the sensor [42].

Results
In the time-resolved measurements, histograms of photon counts versus time and wavelength are recorded at each scanning position (an example of these histograms at a random scan position is depicted in Fig. 4a). Time-resolving the interference signal allows arbitrary filtering of the backscattered signals based on their temporal content. Spectral lines can be formed e.g. by selecting signals from a certain depth region located at certain time bins, or each line of bins can be processed separately. In this proof of concept demonstration, we simply choose to use all bins where the backscattered signal from a fingertip is located (Fig. 4b). Finally, each interference spectrum is transformed to an A-scans (Fig. 4c) using the previously described processing steps to get the final OCT image (Fig. 4d).
The quality of the OCT image in Fig. 4d is suboptimal when compared to state-of-the-art SD-OCT (for example in [35]), since CMOS SPAD sensors do have lower equivalent quantum efficiency (QE). However, one needs to take into account that the detection efficiency of SPADs in the NIR region is rapidly developing [37,[43][44][45][46]. As an example, the 31.4 % PDE at 850 nm reported in [46] is roughly 20 times higher than that of the SPADs used in this study. We discuss sensitivity and ways forward further below. Here, the advantage is that time-resolved data acquisition offers a means for altering the data before generating the spectral lines, based on the time of arrival of photons. In this demonstration the highest (50 ps per bin) resolution was used, providing a 1.6 ns time range of the histograms. The sensor allows extending this time (and depth) range at the expense of bin resolution and, consequently, the time sectioning capability. An important thing to note is that the backscattered signal overlaps several time bins (see the histogram in Fig. 4a). This is partly due to the jitter of detection, which limits the separation of photons backscattered from different depth levels, but also because of multiple scattering of photons when travelling through tissue. Considering the former, let us assume that two optical signals can be distinguished if they arrive to the detector with a minimum time difference that equals the FWHM of the detection jitter. Expressed in depth instead of time, a detection jitter of 100 ps (FWHM) equals a distance of 15 mm in air, or 10 mm in a medium having a refractive index of 1.5 (a rough estimate for tissue). Consequently, this highlights the importance of jitter. Features along the depth that are closer than distance defined by jitter cannot be distinguished based on their time of arrival, even though the resolution of the timers of the sensor may be better (e.g. 50 ps).
For demonstrating time-masked acquisition, Fig. 5 (left) shows the composite image of the fingertip and the coverslip without any temporal discrimination. Straight after taking the non-masked image another image was acquired with the temporal mask turned on (Fig. 5 (right)). In the time-masking experiment, the distance between the coverslip and the fingertip was 30 mm. The experiment was repeated with 15 mm and 8 mm distances between the fingertip and the coverslip (not shown). At these distances photons back-reflected from the coverglass could not be perfectly differentiated from the photons backscattered from tissue as these signals are overlapping in time due to jitter. The IRF inferred from these distances is in good agreement with the measured mean IRF of 249 ps. However, partial removal of the coverslip reflection (i.e. the reduction of its intensity) was still possible at an 8 mm distance from the fingertip. Shorter distances were not evaluated given the technical difficulties of aligning the mask signal with sufficient accuracy. Even though non-linear optics provide more accurate timing [19] and depth sectioning, state-of-the-art SPADs already achieve better timing performance with sub 100 ps jitters [47][48][49]. A noticeable aspect of the time-masked image is a higher noise level that is also different on each A-scan. This is believed to be caused by slight variations in the timing of the mask windows compared to the time profile of the optical signal at each pixel (similarly to the timing skew described by Nissinen et al. in time-gated Raman spectroscopy [27]). A mismatched mask length and/or position would cause a slightly different amount of signal at each pixel compared to a case of uniform masks. The per-pixel intensity difference from this effect, when serially uncorrelated, produces broadband noise after the Fourier transformation and hence an increased noise floor of the A-scans. The theoretical sensitivity is calculated to be 103.83 dB. The calculations also reveal that the noise of the detector and the light source is much lower than shot noise (Eq. (4)), suggesting shot noise limited sensitivity. The measurements also suggest that the main limiting factor is shot noise. This could be illustrated in the frequency domain where the pixel with median DCR yields 2-3 dark counts and about 2000 incident photon counts for an exposure time of 1 ms. However, the measured sensitivity of 87 dB is lower than the theoretical value, which is likely caused by the DCR distribution of pixels. Equation (3) assumes that the detector noise is white, therefore the noise power at a depth sampling point is equal to that at a spectral sampling point (i.e. a single pixel), according to Parseval's theorem. However, in reality, the pixels do not behave uniformly concerning their DCR, with some pixels yielding a DCR that is several orders of magnitude higher than that of most pixels [38]. A higher DCR also means a higher variance of the dark counts at these pixels, even though the mean value of the dark counts is removed during the background line subtraction (i.e. the expected value of dark counts will be zero along the spectrum just before the Fourier transformation, however, some pixels deviate from this expected value more than others). In effect, this increases the noise floor of the A-scans compared to a case where all pixels have the same mean and variance of dark counts. High DCR SPADs of the sensor pixels can be turned off individually for improved noise performance [50], which we aim to test in the following studies to achieve higher sensitivity.
Apart from minimising the detector's contribution to noise, equations (1)- (5) suggest that for a certain incident power the sensitivity can be increased by using longer exposure times or having a higher PDE. In general, longer exposure times allow the capture of more light and increase the ratio of signal to shot noise. However, integrating and SPAD-based sensors have different behaviours regarding the total amount of light they can detect. In integrating sensors, such as charge-coupled devices (CCDs), the detectable optical energy (i.e the integral of the detected optical power over time) is limited by the FWC of the pixels. In contrast, with a photon-counting SPAD sensor, the detected counts (and therefore the OCT sensitivity) can be increased through longer and longer exposure times. The maximum range of the device's digital counters does not impose any limit either, since SPAD sensors exhibit no readout noise, which makes multiple subsequent exposures and readouts equivalent to one long exposure. In fact, acquiring multiple spectra at a scan position and averaging the related A-scans permit to keep the exposure time short for a single A-scan. In effect, a high SNR can be maintained without degradation caused by fringe washout [9], however this still requires a long overall acquisition time during which the movement of the sample and related motion artefacts could be problematic (no averaging was performed in this study). Therefore, the use of long (total) exposure times is detrimental. To this end, it is necessary that a high number of photons can be processed in a short time interval and that the sensor can resolve strong bursts of photons.
In practice, even if the measured sensitivity tells what fraction the lowest detectable signal is compared to the strongest possible signal with a given optical source, the sensor may not be able to tolerate the strongest optical power due to its finite saturation level and thus limited dynamic range. In the case of SPADs, the level of saturation is decided by their deadtime. Multiplexed pixel architectures, where multiple SPAD devices are connected within the pixels (such as in our sensor), can efficiently increase the count rate at which the detector saturates [37]. Alternative recharging mechanisms have also been shown to increase the measured count rate [51]. These solutions are advantageous, provided that the incident photons are dispersed in time and the SPADs of a pixel do not fire simultaneously. With pulsed sources, however, where the photons arrive at the same time, the highest number of photons that can be processed within a certain time frame is set by the laser repetition rate (assuming that the deadtime is shorter than the laser period). We can estimate the dynamic range of a SPAD pixel in this case by maximising the received signal (see Eq. (5) and (14) of [3]) as the number of laser pulses during the exposure time, which gives a dynamic range estimate of 70.1 dB for a repetition rate of 20 MHz, an exposure time of 1 ms and considering shot noise only. For increased dynamic range, 2D SPAD arrays can be deployed with multiple parallel processing channels per spectral point, i.e. multiple independent pixels per spectral sampling point. Broadband supercontinuum laser sources with significantly higher repetition rates (up to 325 MHz) are also readily available [6,52,53]. Novel pixel structures can also alleviate the challenge of resolving high-power optical pulses. An increased dynamic range was achieved by dual-layer SPAD pixels where the pairs of SPADs at the front and backside of the sensor have a different PDE due to attenuation caused by the layers of the integrated circuit [49], and with dual-mode pixels where linear diodes and SPADs are combined [54].
Regarding the PDE, it is a result of the FF (sensitive area over the entire area of a pixel) and the photon detection probability (PDP) (probability of a photon producing a detected count, a metric similar to QE). Recent developments in CMOS and SPAD technologies have been enabling highly efficient SPAD structures, both concerning FF and PDP. As an example, superior FF has been demonstrated with 3D stacked SPAD sensors [45,[55][56][57]. SPADs with high PDP in the NIR and infrared (IR) wavelength regions are highly sought after by applications such as optical communication, diffuse optical tomography, and light detection and ranging (LIDAR) in particular, leading to the appearance of novel SPAD structures with highly improved NIR sensitivity [46]. SPAD devices using other materials, such as InGaAs-InP SPADs, have also been investigated for efficient single-photon detection at NIR wavelengths and beyond [58,59]. OCT applications using visible light [60], where the PDP is higher, may benefit from the use of SPAD sensors as well.

Conclusion
We demonstrate, for the first time, the combination of time-resolved single-photon counting (SPC) and low-coherence interferometry (LCI) using a complementary metal-oxide-semiconductor (CMOS) single-photon avalanche diode (SPAD) line sensor in a spectral-domain optical coherence tomography (SD-OCT) setup. At present, our system has suboptimal performance concerning sensitivity when compared with OCT systems using traditional charge-coupled device (CCD) and CMOS sensors. To maintain a sufficient sensitivity of 87 dB, exposures of 1 ms were required preventing the A-scan rate to reach similar levels as in state-of-the-art SD-OCT systems. Increasing the photon detection efficiency (PDE) allows weaker backscattered signals to be detected. While SPAD sensors have come a long way in that regard, further improvements are required to reach competitive quantum efficiency (QE) and fill factor (FF) with CCD/CMOS cameras. Regarding the noise of the sensor, SPADs are free of read noise due to the digital nature of detection. In addition, the number of dark counts is low for most of the pixels of the recorded spectra, both because of our sensor's dark count rate (DCR) and the short exposures that are typical in OCT. However, non-uniform DCR across the pixels may lead to undesired effects increasing the noise floor of the A-scans and decreasing OCT sensitivity. To prevent saturation and to benefit from a sensitivity increase achieved by higher sample illumination power, the device needs to be able to tackle strong optical signals. This is more challenging for SPADs than for integrating sensors as they need to be quenched and recharged after the detection of photons. As an example, to provide the same performance as a CCD/CMOS pixel with a full well capacity (FWC) of 100 × 10 3 electrons, with an exposure time of 100 µs a SPAD pixel should not spend more than 1 ns being insensitive to further photons after detection (i.e. 1 ns deadtime). SPAD sensors, on the other hand, offer massively parallel time-resolved SPC which promotes unique capabilities for spatial/temporal filtering of backscattered spectra. As proof of concept, we demonstrate how OCT images are acquired from time-resolved spectral photon count histograms and exemplify the approach to suppress unwanted reflections using the time-masking feature of the sensor.
Continuing work on increased integration of CMOS SPAD structures (e.g. 3D stacking) while reducing jitter and improving time resolution and uniformity will contribute to the effort to combine interferometry and massively parallel time-resolved SPC. We anticipate that the platform will be useful for studying the nature of tissue scattering, as improved timing with SPADs will allow depth sectioning with higher precision. For example, a temporal jitter of 20 ps would be equivalent to a distance of 3 mm in the air (or~2 mm in tissue). We also believe that merging photon counting, time-resolved SPC, and interferometry using CMOS SPADs awaits applications in numerous fields including quantum optics, 3D ranging, and biophysics. Finally, combining common-path interferometry with time-resolved detection will be useful for both fluorescence and backscattered scenarios.