Probe-hosted large area silicon photomultiplier and high-throughput timing electronics for enhanced performance time-domain functional near-infrared spectroscopy

: Two main bottlenecks prevent time-domain diﬀuse optics instruments to reach their maximum performances, namely the limited light harvesting capability of the detection chain and the bounded data throughput of the timing electronics. In this work, for the ﬁrst time to our knowledge, we overcome both those limitations using a probe-hosted large area silicon photomultiplier detector coupled to high-throughput timing electronics. The system performances were assessed based on international protocols for diﬀuse optical imagers showing better ﬁgures with respect to a state-of-the-art device. As a ﬁrst step towards applications, proof-of-principle in-vivo brain activation measurements demonstrated superior signal-to-noise ratio as compared to current technologies.


Introduction
In the last years, new low cost photonics instruments for clinical monitoring of vital parameters (such as tissue oxygenation) as well as for diagnostics (e.g. tumor detection) have been proposed [1][2][3][4][5]. In this context, diffuse optics aims to study the composition of biological tissues through the recovery of the absorption (µ a ) and reduced scattering (µ s ') coefficients [6,7], which are related to tissue chemical composition and microstructure, respectively.
Most of the proposed systems (even wearable ones) are based on the continuous wave (CW) approach due to the inherent lower cost and simplicity of the instrumentation [8]. Yet, CW instruments suffer from issues like vulnerability to superficial artifacts [9,10], lower depth sensitivity [11] and lack in precise quantitation [12]. However, some of these issues have been solved using suitable strategies (e.g. multi-distance or tomographic measurement schemes to uncouple superficial artifacts from deep signals [13] or hybrid approaches to improve quantitation [14]). In the last years, CW instruments have reached very high performances as demonstrated for example in Ref. [15]. However, the time-domain (TD) approach is considered to offer higher informative content and more promising performances [11,16].
The TD method relies on the injection of short laser pulses (duration of few tens/hundreds of ps) in the sample and on the reconstruction of the temporal distribution of the re-emitted photons (distribution of time-of-flight, DTOF). This approach has several advantages such as disentangling the specific effects of absorption and scattering phenomena, and encoding the mean depth probed by photons in their arrival time. As demonstrated in Ref. [11], the TD technique has the potential to reach depths of about 6 cm in the ideal case -i.e. i) a dense coverage of pulsed sources and detectors onto the probe to maximize the amount of injected light; ii) a high-throughput timing electronics to process all photon counts without losses; iii) a time-gating capability to collect photons at null source-detector separation distance [17]. Conversely, using an ideal CW system under the same conditions, the penetration depth is limited to less than 3 cm. The improvement in depth sensitivity achievable with new generation TD instruments will allow one to explore completely new applications for in-vivo non-invasive diagnostics (e.g. monitoring of deep organs such as lungs or heart), which are not easily addressed by optical methods.
Compared to CW, TD systems are however more complicated, cumbersome and costly. Indeed, many of them are laboratory prototypes (see, for example, Ref. [18][19][20][21][22]) hosted inside rack cabinets. However, in the last years, several papers showing first attempts for source miniaturization and detector probe-hosting [23][24][25], as well as a first compact and cost-effective systems have been published [26], thus demonstrating the possibility to decrease both cost and size of TD systems, still maintaining excellent performances.
Two main bottlenecks prevent the full exploitation of the TD technique, that are: i) the limited light harvesting capability of the detection chain; ii) the limited data throughput of the timing electronics. These two limitations have to be overcome jointly for achieving improved performances. A large area detector can actually provide a large number of photon counts to a timing electronics, but this latter must sustain very large count-rates (CRs) to avoid losses of information.
For what concerns the detectors, the choice is between photo-cathode based devices (e.g. Photomultiplier Tubes -PMTs-, hybrid PMTs and Micro-Channel Plate PMTs) and solid-state detectors (Single-Photon Avalanche Diodes -SPADs-and Silicon Photomultipliers -SiPMs-). The former ones feature a large active area, which allows them to have a high light harvesting capability. However, they are bulky and not scalable in dimensions (preventing probe-hosting, thus requiring the use of optical fibers with limited numerical aperture, reducing the photon collection efficiency), sensitive to electromagnetic fields and mechanical shocks and require a high biasing voltage (in the order of kV) to be operated [16]. All these characteristics make them not suitable for next generation of TD-instruments which aim to be scalable, low-cost and wearable. In the last years, solid-state detectors have been considered the best solution for new TD-systems [16]. Up to few years ago, SPADs were the only solid-state detectors used in TD diffuse optics [27]. However, they were not the first choice for diffuse optical measurements due to their small active area (diameter of few hundreds of micrometers in the best cases [28,29]) to maintain performances suitable for diffuse optical applications). In 2015 this bottleneck was overcome thanks to the introduction of SiPMs in TD diffuse optics [30]. Indeed, SiPMs are arrays of several SPADs (hundreds or thousands) all connected in parallel and with a high fill factor (usually larger than 50%). For this reason, they have the advantages of solid-state detectors (scalability, ruggedness and immunity to electromagnetic field), while overcoming their main limitation, that is the small active area. The typical active area of the SiPMs used in diffuse optics are in the order of 1 mm 2 (see, for example, Ref. [31][32][33][34]). The improvement in the active area (from µm 2 to about one mm 2 ), coupled to the possibility to place the detector directly in contact with the sample, increase the light harvesting capability reaching figures even higher than those obtained with state-of-the-art fiber-based systems [30].
The maximum achievable throughput is another issue which sets a limit to further boost the performances of TD systems. Actually, in modern TD instruments the timing electronics adopts two different strategies: time-to-analog converters [35,36] and time-to-digital (TDC) converters [37]. The former usually features better timing resolution and linearity at the expenses of cost and size. On the other hand, TDC based electronics can be integrated in a single chip, thus making it suitable for monolithic and/or multi-channel systems [16]. Despite the different characteristics of those two technologies, nowadays they show overall comparable performances in terms of maximum achievable count-rate, whose limits are usually around few millions of counts per second (Mcps).
Several works have been published showing the possibility to integrate detectors (usually a matrix of small area SPADs) with high speed TDCs [16]. Restricting the field to the devices specifically designed for photon timing application and considering the use of the entire SPAD array as a unique larger area detector (i.e. not for imaging purposes), Table 1 reports the main performances (in terms of overall sensitive area and throughput) of the high-throughput devices recently published. In some cases, single basic units (a small array of detectors and TDC) are parallelized to improve the overall performances of the system [38,39]. From Table 1, it is possible to notice that in all cases the overall detector area is always below 1 mm 2 . Focusing on discrete detectors, Ferocino and co-workers [40] demonstrated the possibility to increase the light harvesting capability using an arrangement of 8 separate SiPMs, thus covering an area of 13.5 mm 2 (with a fill factor of 74%, thus resulting in about 10 mm 2 sensitive area). However, in this case, each SiPM provides a single output which has to be summed up with the other seven in post processing. This solution allows one to reach an overall throughput of 40 Mcps, but at the expenses of system complexity and cost as eight separate timing channels are needed (in this case, 8 inputs of a commercial TDC).
Concerning developments in data throughput capabilities of TD systems, a high-throughput gated counter capable to sustain up to 100 Mcps was published and validated for diffuse optical measurements [41,42]. In this case, however, it is not possible to reconstruct the whole DTOF since only the overall number of photons within two well defined time-windows are saved. Such a solution can be profitably used in applications where the whole waveform is not needed (e.g. muscle or brain monitoring), but it cannot be used in applications where a spectroscopic approach is required since it does not permit to recover both µ a and µ s '.
As shown above, recent works show improvements for both detectors and timing electronics but, at the best of our knowledge, the two bottlenecks have been addressed separately. In this paper, we demonstrate for the first time the possibility to couple a large area detector with a high-throughput electronics.
To this purpose, here we use a 3 × 3 mm 2 SiPM detector (i.e. the largest area device from which we were able up to now to obtain sub-nanosecond single-photon timing resolution, with 82% fill-factor, resulting in an overall active area of 7.38 mm 2 in a single device) to improve the light harvesting and an innovative time-correlated single-photon counting (TCSPC) device, which can sustain up to 180 Mcps with a very short dead time (650 ps). This system is the first prototype laying the foundation for the next generation of TD system with high performances.
In this paper, we present the system and characterize its performances using internationally agreed protocols for TD instruments such as BIP, MEDPHOT and nEUROPt [47][48][49]. Further, we challenge it for in-vivo motor cortex activation tests (finger tapping exercise) to show the improvement in signal as compared to a state-of-the-art system based on a 1 mm 2 fiber-coupled SiPM detector.
The paper is organized as follows: Section 2 deals with the description of the system, the phantoms and the protocols for in-vivo measurements, together with data analysis; Section 3 reports the results and discussion of phantom characterization, as well as of the in-vivo acquisitions. Finally, in Section 4 conclusions and future perspectives are presented. Figure 1(a) reports the schematics of the experimental setup. A pulsed laser driver (PDL 828 Sepia II, Picoquant GmbH, Germany) provided pulses at 40 MHz to two high power laser heads operating at 670 and 830 nm (LDH-P-C-670M and LDH-P-C-830M, Picoquant GmbH, Germany). Each laser head output was directly coupled to an optical fiber (core diameter 600 µm, step index), which serves as the input of a fiber-to-fiber u-bench where a variable optical attenuator ("VOA" in Fig. 1(a)) was hosted to set the proper photon counting rate. The output of each u-bench is another 600 µm diameter core fiber, which was fixed to the 3D printed probe through an SMA connector (as reported in Fig. 1(b)). The maximum average power at the fiber tip was about 2.7 and 2 mW for 670 and 830 nm, respectively. The 3D printed probe also hosted the detector, a 3 × 3 mm 2 SiPM (nominal area = 9 mm 2 , S13360-3075CS by Hamamatsu Photonics K.K., Japan) with an overall effective active area of 7.38 mm 2 considering a FF of 82% and a dark count rate (DCR) of 550 kcps (at a temperature of the room of about 20°C).

Setup
The detector is mounted on a ceramic chip carrier and buried inside a suitable optical resin that provides both optical access and electrical isolation. Thus, the detector was directly put in contact with the sample to exploit at best its large numerical aperture to maximize the collection of the backscattered light. The source-detector separation (SDS, computed as the distance between the injection fiber and the center of the detector) was set to about 3 cm (more in detail, 28.8 and 30.1 mm for the 670 and 830 nm respectively). Under this condition, the lateral size of the detector (3 mm) is expected to have negligible effects on the spatial resolution of investigation [49].
As can be seen in Fig. 1(b), two radiofrequency electrical cables were arranged on the probe and connected to the SiPM, providing the biasing to the detector (56.3 V, i.e. 4 V above the SiPM breakdown level), and conveying the output signal to a fast low-noise broadband pulse amplifier (Ortec VT120A, AMETEK Inc, USA, "pre-amplifier" in Fig. 1(a)).
The timing electronics used for the proposed system was a high-throughput 8-channel TCSPC module (MultiHarp 150, Picoquant GmbH, Germany, "MH150" in Fig. 1(a)) featuring a subnanosecond dead-time (650 ps), thus allowing to sustain up to 180 Mcps in histogramming mode [50]. The temporal resolution of its time scale (i.e. the size of the time bin) was 80 ps. The sync input signal to the TCSPC module was provided by the laser synchronization output (as depicted in Fig. 1(a)).
The present system is bulky, because of laser and TCSPC electronics. Yet, suitable compact probe-hosted laser sources have been already demonstrated [23], as well as integrated TCSPC devices capable of extremely high throughput [38,39,44,45,51]. All these advances can permit in the near future the fabrication of extremely compact and wearable TD diffuse optics devices.
Since in-vivo measurements are affected by biological fluctuations that can confound the information, diffused photons were simultaneously recorded also by a 1 mm 2 fiber-based SiPM module [31]. The detection fiber (core diameter 1 mm, step index) was screwed in the same 3D-printed probe and the collected light was focused onto the active area of the SiPM module. Also for this control channel, the SDS was set to about 3 cm (28.8 and 30.1 mm for 670 and 830 nm, respectively) and the arrangement of the detection fiber was chosen so as to investigate the same portion of the sample probed by the 9 mm 2 SiPM. The DCR of the 1 mm 2 module is about 125 kcps. The output signal of the SiPM module was then sent to a second channel of the MH150 to record the DTOF. In this way, during in-vivo measurements it is possible to record the same biological fluctuation for a better comparison of differences in terms of signal-to-noise ratio from the two detectors.

Description
It is well known and widely discussed in the literature that classical TCSPC instruments are characterized by their dead-time (T dead ), which is usually in the order of tens to hundreds of nanoseconds [52,53]. During T dead , the detection chain is blind and all imping photons are neglected. The pile-up effect is a phenomenon that rises from the capability of the classical TCSPC board to detect only one photon per laser cycle. Since only one photon can be detected, the early photons are those that have a higher probability to be recorded, while the late ones are more prone to be lost, thus leading to an artificial increase in the slope of the DTOF tail [52]. Obviously, the distortion in the curve is more important as the number of impinging photons increases, thus conventionally the recorded count-rate is kept within the 1-5% of the laser repetition rate [52] and, as recently discussed by Cominelli et al. [54], the laser period should preferably be an integer multiple of dead-time to avoid further distortion of the DTOF. If this condition is not matched, some portions of the DTOF are recorded with lower efficiency, resulting in regions of the DTOF where almost no photons are detected (e.g. a sort of undershoot of counts followed by a bump, that is a fingerprint of the recovery of the detection chain).
In this setup, the dead-time of the timing board is 650 ps [50], thus the pile-up effect is dominated by the T dead of the 9 mm 2 SiPM detector and its front-end and amplification electronics, which was estimated to be ≈3.6 ns (assessed using an oscilloscope from the minimum distance between two consecutive distinguishable avalanche pulses). Such a value is much smaller than the standard dead-time of TD-systems, thus allowing a strong potential increase of the count-rate (the maximum saturated sustainable CR of the detector alone is around 270 Mcps, if it is retriggered as soon as it recovers from dead-time), provided that possible non-idealities (e.g. pile-up) are taken into account.

Correction algorithm
The Coates algorithm was applied to the acquired curves before any other processing (e.g. background noise subtraction) [55]. It is based on the calculation of the true probability (S i ) that a photon can be detected at the i-th time bin under the hypothesis that more than one photon can be detected in one cycle (i.e. no dead-time). This probability is computed from the number of photons (N) recorded both in the ith channel (N i ) and in the previous i-1 channels, but considering also the overall number of cycles of the measurements (M). The Coates' formula states: where The corrected number of counts in the i-th channel can be computed as N i = M S i . Although the algorithm is simple and easy to be implemented, it does not need any assumption such as the apriori knowledge of the correct photon distribution or the uniformity in the time-bins [55].
Since the dead-time of the detector is about 3.6 ns, the correction is valid for a temporal range stretching no more than 3.6 ns from the beginning of the rising edge of the DTOF. In practice, we decided to consider the 3.6 ns validity range starting from the peak of the Instrument Response Function (IRF) since very few photons are detected before that, particularly with SDS=3 cm. After pile-up correction, all other post-processing analysis (e.g. background subtraction) were performed.

Measurements and data analysis
To rigorously characterize the system, we made use of internationally-agreed protocols (BIP, MEDPHOT and nEUROPt [47][48][49]) for performance assessment of diffuse optical instruments. In the following, measurements and data analysis procedures are presented.
Unless specified, all the tests for all protocols were repeated at different CRs (namely, 1, 15 and 30 Mcps nominal values) by acting on the VOA and observing the total photon counting rate displayed by the TCSPC module software. Only one wavelength (670 nm) was used for phantom characterization as the phantoms in use do not show any relevant spectral fingerprint and the characteristics of components were similar at both wavelengths.

BIP protocol
The BIP protocol was designed to assess the basic hardware performances of the system [47]. In this case, we adopted the following figures of merit: the detector responsivity, the Differential Non Linearity (DNL), the IRF, and the temporal stability.
The responsivity quantifies the overall harvesting capability of the system. It is computed as the ratio between the number of photons detected at the output surface of a calibrated reference phantom to mimic a nearly Lambertian angular profile and the power injected on the opposite side. Responsivity depends on many different factors, such as quantum efficiency, area and numerical aperture of the detector, collection optics, and dead-time of the timing electronics [47]. More details on the measurement procedure and the phantom used can be found in Ref. [47]. In our case, the responsivity was computed at the following nominal CRs: 1, 10, 15, 20, 30 and 40 Mcps (i.e. the maximum CR compatible with a laser operating at 40 MHz repetition rate). For all cases, the calculation was repeated both with and without pile-up correction to check the effect of the pile-up.
The DNL test assesses the non-uniformity in the time bins. The number of photons detected in one channel is proportional to its width, thus any non-uniformity creates distortions in the DTOF. To quantify the DNL with a good Signal-to-Noise Ratio (SNR), a CW illumination yielding at least 10 5 counts per channel is needed. In our case, due to the different CR of the measurements, we modified the number of repetitions so to have a number of counts per channel of about 10 6 , independently of the CR. In the ideal case, the number of detected photons in different time bins should be the same so, to assess any deviation, the BIP protocol suggests to compute the peak-to-peak variation in the number of counts recorded per bin with respect to the mean value of counts (ε DNL ), as reported in Eq. (3): where N DNL,max (N DNL,min ) is the maximum (minimum) number of counts recorded across the different time bins, while N DNL is the mean number of photons detected over the whole time scale.
Due to the not flat background of the 9 mm 2 detector (as highlighted in the results and discussion section), to assess the DNL we used as a detector the 1 mm 2 fiber-based SiPM (see Fig. 1(a)). This procedure is justified as the DNL value depends only on the timing electronics board.
To characterize the time resolution of the whole system, the BIP protocol dictates to record the IRF (i.e. the temporal response of the instrument when no diffusive sample is present). To correctly replicate the measurement conditions, a thin layer of Teflon was inserted to fill the whole acceptance area of the detector with all the light propagation modes of the injection fiber. As stated in the BIP protocol, 20 repetitions of 1 s each were acquired. The IRF was taken for the three CRs in order to assess any possible issue (e.g. pile-up effect, see Sect. "Correction algorithm") arising from the use of a count-rate above the single-photon statistics.
Another important feature that can have a large impact on the system performances is the temporal stability, which was monitored acquiring continuously the IRF for 1 hour starting from the system start-up. The changes in the temporal position of the IRF as well as its amplitude (overall number of counts in the pulse after background subtraction) and shape (assessed by its Full-Width at Half Maximum, FWHM) were considered.

MEDPHOT protocol
The MEDPHOT protocol [48] aims to assess the capability of a system to recover µ a and µ s ' of a homogenous medium over a wide range of optical properties. The MEDPHOT phantom kit is composed of 32 phantoms divided in 4 series (each one having a fixed scattering value, nominally 5, 10, 15 and 20 cm −1 at 800 nm). For each series, there are 8 phantoms featuring different absorption coefficients (nominally from 0 to 0.35 cm −1 at steps of 0.05 cm −1 at 800 nm). The phantoms are made of epoxy-resin and calibrated quantities of titanium dioxide and black toner were added to provide scattering and absorption properties. For further details about the phantoms see Ref. [48]. Considering the typical optical properties of the human tissues (such as brain and muscle), we adopted a subset of the MEDPHOT phantom kit featuring a range of absorption from 0.07 to 0.28 cm −1 and reduced scattering from 7 to 16 cm −1 [56] at 670 nm.
The whole phantom subset was measured to assess the linearity in the recovery of µ a and µ s ' as well as the possible coupling between retrieved absorption (scattering) vs. nominal scattering (absorption): 20 repetitions of 1 s were acquired for each phantom. To retrieve the optical properties, we first summed up all 20 repetitions to improve SNR and then we fitted the data using the analytical model of the photon transport in a diffusive semi-infinite medium under the diffusion approximation [57]. The fitting algorithm reaches the convergence minimizing the χ 2 using a Levenberg-Marquardt routine [58]. To take into account the non-idealities of the system, the IRF shape (recorded at the same CR as the measurement) was convolved to the analytical model. The curves were fitted in the range spanning from 20% of the peak on the rising edge down to the 5% of the tail of the curve. The fitting range was chosen as the one giving the best results for phantoms with low signal (i.e. characterized by high scattering and absorption) at 1 Mcps and was kept constant for all measurements. To check the effect of the pile-up and the possibility to properly correct its effects, the fitting procedure was repeated for curves with and without pile-up correction.

nEUROPt protocol
The nEUROPt protocol [49] was meant to objectively assess the capability of an optical brain imaging system to detect, localize and quantify the absorption changes occurring in the brain. In this paper, we adopted only the sensitivity tests (contrast and contrast-to-noise-ratio), which are related to the capability of the system to detect a localized inhomogeneity buried in depth within a homogeneous diffusive medium.
The contrast (C) is defined as the relative change in the number of photons when the perturbation is present (heterogeneous case) with respect to the homogeneous state (i.e. no perturbation), as reported in Eq. (4): where N 0 and N are the overall number of counts recorded in the homogeneous and heterogeneous case, respectively. The contrast-to-noise ratio (CNR) is, on the other hand, an index of the robustness of the contrast with respect to the noise (e.g. Poisson noise and other possible sources of fluctuations): where N 0 − N is the mean value of the difference over the repetitions between the homogeneous and heterogeneous case while σ(N 0 ) is the standard deviation of the number of counts over several acquisitions of the homogeneous curve. The typical criteria for detectability of absorption perturbations buried inside scattering media are: i) C ≥ 1% (to distinguish the change in N 0 in living tissues) and ii) CNR ≥ 1 (to grant that the change in N 0 exceeds its fluctuation, due to e.g. Poisson noise) [11].
Both contrast and CNR can be computed inside different time gates (i.e. selecting given portions of the DTOF with respect to the IRF peak) to improve the depth selectivity since the arrival time of the photons encodes the mean visited depth. In this case, N 0 and N in Eq. (4) and (5), are no longer referred to the whole DTOF but must be computed for the specific gate. For the analysis, we used 5 time gates with a 480 ps width.
As for the MEDPHOT protocol, the calculation of C and CNR was repeated with and without pile-up correction to check its effect and the possibility to correct distortions.
For the measurements, we made use of a mechanically switchable phantom [59]. It is composed of a main block with homogeneous optical properties (µ a = 0.1 cm −1 and µ s ' = 10 cm −1 at 670 nm) in which a cylindrical cavity hosts a movable cylinder. Such a rod has the same optical properties of the bulk phantom and it embeds a totally absorbing inclusion of 98 mm 3 , featuring an absorption perturbation equivalent to an absorption increase of 0.17 cm −1 over a 1 cm 3 volume (see Ref. [60] for further details). The probe in Fig. 1(b) was placed on the phantom in order to have the perturbation always centered with the middle point of the source-detector couples, independently of its depth. The rod was moved so as to have the center of gravity of the inclusion ranging from 2 to 40 mm at steps of 2 mm below the phantom surface, thus testing the depth sensitivity of the system. For each position, 20 repetitions of 1 s each were recorded. The homogeneous measurement was acquired when the inclusion was completely out from the bulk phantom, thus its effect was completely negligible. The targeted CRs (i.e. 1, 15 or 30 MHz) were set acting on the VOA in this last condition and therefore they are referred to the CRs in the homogeneous cases. The CR in the heterogeneous case (i.e. when the effect of the perturbation is not negligible) is typically lower due to the presence of the absorption perturbation.

In-vivo measurements
After validating the system on phantoms, we challenge it with in-vivo motor cortex activation adopting a finger tapping exercise [61]. In this case, both wavelengths were used to enable functional measurements by recovering both oxygenated and deoxygenate hemoglobin concentrations (HbO 2 and HHb, respectively). For comparison purposes, the signal was concurrently recorded using the external 1 mm 2 fiber-based SiPM module (see Setup section for details).
Three healthy subjects were enrolled in the experiment. Written informed consent was given by participants and measurements were approved by the Ethical Committee of Politecnico di Milano and conducted in compliance with the Declaration of Helsinki.
Each experiment consisted of 5 repetitions of 3 consecutive phases: 20 s of baseline (i.e. the subject was resting), 20 s of finger tapping task (at a self pace of about 2-3Hz) and finally 20 s of recovery (with the subject resting again). The overall duration was 5 minutes.
The probe was located on the motor cortex (C3 position according to the 10/20 EEG international system) and, to measure both ipsilateral and contralateral hemodynamics changes, in two different experiments the subject was asked to perform the full 5-minute experiment first with left hand and then with the right hand.
DTOFs were acquired every 1 s simultaneously at both wavelengths. A method based on the computation of the mean time pathlengths travelled by the photon in a 2-layer medium [62] was used to derive the absorption coefficient changes ∆µ a,j (λ, T) in each layer j (with j = 1, extra-cerebral tissue, and j = 2, intra-cerebral tissue) at each experimental time T. Absolute values for the baseline absorption coefficient µ a0 (λ) were then obtained by fitting the average DTOFs in the baseline to a homogeneous model for photon diffusion [57]. The time course of the absorption coefficient was then derived as µ a,j (λ, T) = µ a0 (λ) + ∆µ a,j (λ, T). Finally, hemodynamic concentration time courses HbO 2 (T) and HHb(T) were obtained by Beer's law.

Results and discussion
Here the results and the discussion of the measurements described in the previous paragraph are presented and commented. Table 2 reports the responsivity values obtained at the different target CRs, both using the not-corrected DTOFs and the corrected ones. In the "count rate (MHz)" column is reported both the "true value" (i.e. the actual TCSPC module CR obtained due to the VOA discretization) and the "corrected value" (i.e. the total amount of counts in the DTOF after pile-up correction). Looking at the values obtained using the not-corrected curves, it is clear that the increase in the CR is responsible for pile-up effects, thus reducing the ratio between the number of photons detected and those actually impinging on the detector. Such a reduction results in a decrease of the responsivity for the measurement at 40 Mcps, which is less than half of the supposed true value (i.e. at 1 Mcps, within single-photon statistics). This means that more than half of the actual photons are not detected due to pile-up.

BIP protocol
To remedy this issue, we used Coates algorithms which corrects for the number of detected photons at high throughput, thus leading to a fairly constant value of the responsivity up to CR=30 Mcps (variation always lower than 1.5% with respect to 1 Mcps, in agreement with the expected experimental variability). When a count-rate of about 40 Mcps is reached, the recovered value is still a bit underestimated (about 7%) with respect to the supposed true value, meaning that the correction is effective even if not perfect. This may be due to other sources of photon losses that arise at such large CRs. For example, it is well-known and widely discussed in literature that SiPMs suffer from avalanche pulse pile-up at high detection rates [63]. Despite sharing a similar name, this phenomenon is completely different from the statistical distortion affecting TCSPC systems as it is related to the decay tail of the detector electrical response, which may generate severe baseline fluctuations affecting the detector response. Clearly, the avalanche pulse pile-up cannot be addressed by the Coates correction, which assumes the IRF to be constant. To avoid confusions, in the following we will always refer to the statistical distortion of the DTOF in TCSPC instruments simply as "pile-up effect", and to the electrical detector behavior with the term "avalanche pulse pile-up". The DNL error was computed to be always lower than 3.5% and it is independent of the CR used in measurements. This independence was predictable since the DNL error depends only on the time-bin width of the timing electronics, thus there is no expected variation with the CR of the measurement.
As a general comment, these values of responsivity at 670 nm are, to the best of our knowledge, the highest values ever reported for a TD diffuse optics system, thus allowing us to easily reach the desired CR of 30 Mcps with many of the phantom tests described in the following and during in-vivo measurements despite the use of a large SDS (3 cm).
The IRF at different CRs is reported in Fig. 2. More in detail, Fig. 2(a) reports the raw curves acquired, while Fig. 2(b) shows the same curves after pile-up correction and background subtraction. Looking at the IRFs before pile-up correction and background subtraction (Fig. 2(a)), one could notice the expected peak increase with the CR. At the same time, also the background level raises, but with a lower rate with respect to the signal, which is potentially due to the afterpulsing of the detector [64]. Another feature to be underlined is the presence of the bumps mainly visible for 15 and 30 Mcps, which occur after about 4 ns and 8 ns from the peak. As discussed in pile-up section, they are due to the dead-time of the detector and the effect is in line with what reported by Cominelli et al. in Ref. [54].
As usually done in TD-measurements, to improve the dynamic range (DR) of the measurements, the average value of the background noise is removed. The IRF curves after pile-up correction and background subtraction are reported in Fig. 2(b). From comparison with the previous graph, it is evident the improvement in DR and also the change in the DTOF shape due to the pile-up correction (mostly evident in the 30 Mcps curve around the peak region). In this condition, the FWHM of the IRFs is 308, 356 and 556 ps at 1, 15 and 30 Mcps of nominal count rates, respectively. This change is again most probably due to the avalanche pulse pile-up effect, that is a well-known source of distortions in the IRF shape of SiPMs operated at high CRs [63]. As specified in Sect. 2.3, when the IRF shape is relevant for the analysis (as for MEDPHOT tests), the IRF is taken at the same CR (therefore needing pile-up correction) as that of the phantom measurements in order to have a similar detector response. This however cannot grant an identical behavior of the detector during IRF and phantom measurements since the avalanche pile-up effect is expected to depend on the probability that one photon can arrive soon after a previous photon detection and therefore on the shape of the optical waveform under measurement.
Regarding the temporal stability of the IRF, the results obtained are reported in Fig. 3. The stability of the IRF was evaluated in terms of counts, pulse width (FWHM) and temporal position of the center of gravity after 10 minutes of warm up. For counts and pulse width, data are displayed as a percentage variation with respect to the mean value (computed on the 50 minutes of measurements after warm-up), while for the temporal position we plot the variation (in ps) with respect to the average value. For all parameters considered, we can clearly notice a good system stability with a peak-to-peak variation within the ±0.5% for the counts and pulse width and ±10 ps for the temporal position of the curves. Such results are completely in line with other state-of-the-art instruments for brain imaging [21,56]. Figure 4 reports the results of the MEDPHOT protocol tests. In the first and fourth rows are shown the linearity in the retrieval of absorption and reduced scattering properties respectively for the different count-rates (columns). Table 3 reports the slope and R 2 obtained from linear fitting for the linearity measurements for both not-corrected (NC) and corrected (C) DTOFs at all CRs. We should warn here that the nominal values are those obtained in a multi-laboratory exercise for performance assessment of diffuse optics instruments [48]. This is the best estimate of the true optical parameters of the MEDPHOT kit available so far. It surely permits to properly assess the linearity of the system, while precise estimate of the accuracy (i.e. discrepancy in absolute values) is still questionable since no clear consensus was reached, differently from the characterization of liquid phantoms where a discrepancy <2% was reached in a different multi-laboratory exercise [65].  For absorption linearity of not-corrected curves, there is in all cases (scattering series and CR) a good linearity (R 2 always larger than 0.987) while in most cases (for 1 and 15 Mcps) there is a general underestimation of the recovered µ a (slope <1). For CR of 30 Mcps, an improvement in the linearity (both in terms of recovered value -i.e. slope value close to 1 -and R 2 ) is present for all scattering series. For what concerns the corrected curves, it is possible to notice that for the 1 Mcps there is no effect in the retrieved absorption value.

MEDPHOT protocol
This was expected since at 1 Mcps the system is working within single-photon statistics, thus there is no relevant pile-up effect to be corrected. On the other hand, for measurements taken at 15 and 30 Mcps, the Coates correction seems to further underestimate the retrieved µ a , without improving the linearity (R 2 is constant or even slightly lower). We cannot provide a definitive explanation for this effect. It might be due to the fit procedure which weighs more the first part of the tail, where the pile-up is affecting less the DTOF. Indeed, we adopted the reduced χ 2 as penalty term in the fitting procedure assuming shot-noise for data. Yet, whenever the model is not completely adequate to describe the data, the points with higher number of counts gain more weight with respect to the pure shot-noise case. At any rate, at 30 Mcps the underestimation of both the absorption and scattering coefficients obtained after Coates correction allows one to improve the accuracy in the µ a retrieval and also to get more uniform values of µ s ' as compared to what obtained with the other CRs (see plot for absorption linearity at 30 Mcps and scattering to absorption coupling graphs in Fig. 4).
On the other hand, results obtained in scattering linearity show the weaknesses of the 9 mm 2 area system in the retrieval of µ s ', mostly for low CRs. Indeed, for 1 Mcps the recovered µ s ' are not much linear and this trend is more evident for phantom series with larger absorption (R 2 going from 0.9289 with µ a = 0.07 cm −1 to 0.6338 for µ a = 0.28 cm −1 ). This might be due to the large amount of background noise which reduces the DR for fitting, thus impairing the retrieval of the optical properties, especially for the scattering coefficient which is more prone to artefacts related to non-idealities of the system. Yet, we have no proof this is the key reason. The increase in the CR improves the linearity of the µ s ' retrieval both in terms of slope (close to 1) and R 2 . For 15 Mcps measurements, the effect of Coates correction is almost negligible, while for 30 Mcps ones the correction compensates the slight overestimation of the µ s ' but at the expenses of the R 2 .
Regarding the absorption to scattering (scattering to absorption) coupling, conclusions can be drawn looking at the second (third) row of Fig. 4. It is evident that for the most scattering phantoms there is a strong coupling between the true µ a and the retrieved µ s ' of the phantom for all CRs (second row). For phantom series featuring µ s ' = 7.3 and 11.9 cm −1 , this effect is less evident. On the other hand, the coupling between retrieved the true µ s ' and the retrieved µ a (third row) is less pronounced.
In all cases, in the fitting procedure the detector was considered as point-like one. Indeed, the use of the correct value of the detector area showed no improvement in the retrieval of the optical properties (data not shown) while the computational complexity increased. Figure 5 reports the computed contrast (top row) and CNR (bottom row) as a function of the perturbation depth obtained for different CRs (columns) for an early and late gate (from 0.72 to 1.2 and from 2.64 to 3.12 ns with respect to IRF peak, respectively).

nEUROPt protocol
Overall, for 15 and 30 Mcps the effect of the Coates correction is evident while, as expected, for 1 Mcps is negligible. Generally speaking, without correction, pile-up effect is expected to artificially reduce the number of detected photons in the DTOF. This decrease is typically larger for homogeneous-case DTOFs as heterogeneous-case ones are often characterized by a lower count rate due to the presence of the absorption perturbation. By analyzing the effect of this phenomenon in Eq. (4), one can notice that pile-up distortion will result into a reduction of the contrast. Indeed, this is clearly visible in Fig. 5 for the contrast computed at early gate since, upon increasing the CR, the amplitude of the contrast function decreases. After pile-up correction, it recovers the same value as expected. Actually, as also reported in [66], the contrast should not depend on the number of detected photons. On the other hand, without correction the CNR for the early gate (slightly) increases for 15 Mcps while, due to pile-up, it decreases for the 30 Mcps. The expected trend, that is an increase in the CNR proportional to the square root of the number of counts (see [66]), is recovered upon applying the Coates correction.
The situation is more complicated for the late gate (i.e. 2.64 to 3.12 ns from IRF peak). Indeed, for 15 Mcps the contrast compression is clearly visible, while at 30 Mcps the not-corrected contrast shows an anomalous profile with an undershoot whose peak is below -0.1 (precisely -0.18 at 14 mm, not shown). To better understand this behavior, Fig. 6 shows the homogeneous and heterogeneous curves (orange and blue curves, respectively) acquired at 30 Mcps for three significant depths (14, 20 and 24 mm) both before and after applying the Coates correction (top and bottom row, respectively). The temporal position of the late gate is highlighted with the gray rectangle while the inset shows a zoom of curves in the gray regions. For not-corrected curves, due to pile-up effect, the DTOF of the homogeneous case presents a steeper decay with respect to the heterogeneous one. This is expected as in the heterogeneous case the number of detected photons decreases due to the presence of the absorbing perturbation, thus reducing the effect of the pile-up. For this reason, at 14 mm depth, the late gate for the homogeneous case is lower than the heterogeneous one, thus resulting in a negative contrast. This does not happen at early times as the crossing of the curves is at about 2 ns. When the perturbation is deeper, its effect on the DTOF is weaker, thus reducing the difference in terms of pile-up distortion between the homogeneous and heterogeneous cases and shifting the crossing point toward later times. For this reason, at a depth of 20 mm, the contrast computed in the late gate is almost null, while for the depth of 24 mm, it goes back to the positive (expected) value. When the Coates algorithm is applied (second row of Fig. 6), the corrected curves in the homogeneous case are always larger than the heterogeneous ones, thus leading to a positive contrast, which is more marked at 20 mm (i.e. separation between curves in the homogeneous and heterogeneous case is evidently larger).
Considering the contrast in the late gate for corrected DTOF, from Fig. 5 it is evident that the peak value increases along with the count-rate, which is not in agreement with theoretical expectations in ideal conditions (i.e. when the IRF shape does not change with the CR) [66]. Such an effect can be due to the avalanche pulse pile-up effect, which produces a change in the IRF shape when increasing the CR. Indeed, as it was simulated in Ref. [66], larger FWHM of the IRF does not improve the overall contrast, but leads to a shift of the contrast as a function of the photons arrival time towards later times [67]. Therefore, for certain delays of the late gate chosen for computing the contrast, it is possible to observe an improvement of the contrast when increasing the IRF FWHM.
For what concerns the CNR for late gate, the increase in the CR is beneficial, provided that the pile-up correction is applied. Considering the corrected curves, the improvement of the CNR with respect to the standard case (i.e. 1 Mcps) is of 546% (105%) and 940% (210%) for 15 and 30 Mcps respectively at the late (early) gate. In these standard optical conditions, this penetration is similar to the maximum ever reported [27]. However, the investigation of the maximum achievable depth of investigation is beyond the scope of this paper as the phantom presents a weak heterogeneity in its structure [59], limiting the performance in this kind of investigation. This is for instance the reason for the negative contrast reported in Fig. 5 at shallow depths, a characteristic always detected when it has been previously used for the implementation of the nEUROPt protocol (see e.g. [24]).
In order to limit the additional distortion due to the avalanche pulse pile-up effect, for in-vivo measurements we decided to use a count-rate for each wavelength of 15 Mcps. For this reason, Fig. 7 reports a more comprehensive view of the contrast (top row) and CNR (bottom one) obtained for the depth scan at 15 Mcps after applying the Coates algorithm for pile-up correction. Here, contrast and CNR have been computed considering five different time windows. It is possible to better appreciate that upon increasing the delay of the gate used for analysis, the peak of the contrast is shifted toward deeper positions and the contrast has a larger value if considering later gates. This is due to the fact that the analyzed photons are progressively remitted later, thus they statistically explored a deeper region of the phantom.
For the same reason, the shift toward deeper positions can be seen also for the CNR. In this case, however, due to the reduced number of photons, the value decreases, but for the latest gate (2.64-3.12 ns) it is still about 12 (i.e. well above the value of 1 usually considered as the lower limit for detectability, together with contrast > 1% [11]) even for depth of 32 mm (where contrast is 5%).  Figure 8 reports the contrast at different time-gates for the two wavelengths (colors), for both the 1 mm 2 area fiber-based SiPM and for the 9 mm 2 area in-contact SiPM, during the finger tapping experiment performed by Subject 1. Each gate has a width of 0.5 ns and it is set at delays increasing from -0.5 to 3.5 ns with respect to the IRF peak (0 ns time).

In-vivo measurements
As expected, for both detectors we observe an increased contrast for late time gates with an almost comparable amplitude. The visible difference between the two detectors is the noise level that is appreciably higher in the 1 mm 2 area fiber-based SiPM. Indeed, the noise level is dependent on the total number of collected photons. Table 4 shows the average number (N), standard deviation (σ(N)) over the five repetitions, and coefficient of variation (real CV = σ(N)/N, Poisson CV = (N) −0.5 ) of photons in the baseline collected by the 1 mm 2 fiber-based SiPM and by the 9 mm 2 in-contact SiPM during the in-vivo motor task for the two wavelengths and three subjects. Data refers to the folding average of 5 repetitions. The ability of the 9 mm 2 in-contact SiPM to collect a larger number of photons is clear. Actually, the larger area is not sufficient to explain the difference between the two detectors. An additional feature to be taken into account is the fact that the 1 mm 2 SiPM is fiber based (1 mm core diameter step index silica optical fiber, NA = 0.39), while the 9 mm 2 SiPM is directly in contact with the tissue: indeed, the limited NA plays a key role in determining the collected signal. Regarding the CV, we notice that the real CV over the five repetitions is comparable for the two detectors (being influenced, as expected, by the physiological variations), while the ideal Poisson CV (being related only to the number of collected photons) is of course much less for the 9 mm 2 SiPM, therefore helping discriminability of activation.
Finally, to present the results of the in vivo test in a way that is directly understandable by researchers in the field of functional near infrared spectroscopy, Fig. 9 shows the time course of O 2 Hb and HHb variation (∆O 2 Hb and ∆HHb) during the tapping experiment for the three subjects. For Subject 1 the canonical hemodynamic response (increase in O 2 Hb and decrease of HHb) is clearly visible both with the 1 mm 2 fiber-based SiPM and the 9 mm 2 in-contact SiPM, the former being definitely much noisier. Conversely, for Subject 3 the activation is only

Conclusions and future perspectives
In this paper we presented a new detection chain for TD diffuse optics, which features a large sensitive area (9 mm 2 with 82% fill-factor, resulting in an overall effective active area of 7.38 mm 2 ), coupled to a high-throughput timing electronics (180 Mcps), thus greatly improving the light harvesting capability (i.e. responsivity) with respect to the state-of-the-art. To the best of our knowledge, this is the first system which addresses together both technological bottlenecks of the current TD instruments (i.e. limited light harvesting and throughput). Making use of internationally shared protocols for performances assessment, we validated the use of this system for contrast measurements showing an improvement in the achievable CNR. On the other hand, the system showed limited performances for spectroscopy applications due to unsatisfactory retrieval of scattering properties. However, for many applications (e.g. detection of brain activation) the scattering coefficient, differently from the absorption coefficient, is not expected to vary during the tests and -if needed-a characterization of the static baseline optical properties can be easily performed before the test with a proper spectroscopic instruments or a dedicated instrument channel. The improved performances of the system during a functional test were demonstrated with in-vivo motor cortex measurements, where the robustness of the signal recorded with the 9 mm 2 in-contact SiPM is greatly enhanced with respect to a system using a fiber-based 1 mm 2 SiPM.
The system can further evolve in the near future. The current version of the system is susceptible to ambient electromagnetic interferences (in particular from pulsed voltage signals in the laser heads, which can slightly modulate the background noise if the laser heads are too close to the detector, limiting the possibility of miniaturization in the present implementation) Fig. 9. Time courses of ∆O 2 Hb and ∆HHb measured over the left motor cortex for 3 subjects (columns) during finger tapping exercise done with right and left hand (first two and last two rows, respectively). Both results obtained with 1 mm 2 fiber-based SiPM and 9 mm 2 in-contact SiPM (odd and even rows, respectively) are displayed. The error bars are computed over the 5 repetitions of the task. and its behavior is affected by the detector dead time as well as its DCR. For all those issues, an improvement in the detector readout electronics as well as a cooling down of the device will permit to push the performances even further. Regarding the post-processing analysis of data, more advanced algorithms for pile-up correction may be used, even if in some cases they require the acquisition of data in time-tagging mode [68].
As future perspectives, the proposed single channel device can be easily extended to 8 channels (the TCSPC board can in fact host up to 8 inputs), thus allowing fast acquisitions with higher count-rate with respect to state-of-the-art TD probe-hosted tomographic system as, for example, the one presented in [69].
We envisage that the adoption of larger area detectors -possibly with also time-gated capability -will dramatically improve the performances of the system, approaching the theoretical limits. This will result in the development of a new generation of devices which may open the way to several unexplored applications as for instance the exploration of the human head and chest in transmittance geometry.