Ultra-Sensitive PIN-Photodiode Receiver

A monolithically integrated receiver consisting of a low-capacitance PIN photodiode and a photo-charge integrating amplifier in 0.18-μm CMOS is introduced launching a new class of ultra-sensitive optical receivers. The integrated PIN photodiode has a junction capacitance of 1.5 fF at a light-sensitive diameter of 30 μm, a responsivity of 0.39 A/W at 635 nm, and rise/fall times of 0.73/0.92 ns at a reverse bias of 20 V. The common-source amplifier integrates the photo-charges on the smallest possible integration capacitor, which is a gate-drain overlap capacitance. The data bits are reconstructed by double sampling. In such a way, the sensitivity of SPAD receivers is achieved, however without using any impact ionization. At 50 Mb/s, a sensitivity of −56.4 dBm for a bit error ratio (BER) of 2 × 10−3 is obtained using a wavelength of 635 nm.


I. INTRODUCTION
M OST optical receivers use transimpedance amplifiers with resistive feedback to convert the photocurrent to a signal voltage [1]. PIN photodiodes are very popular detectors, e. g. using the compound semiconductor InGaAs for telecom wavelengths. Avalanche photodiodes (APDs) operated in the linear mode allow to increase the sensitivity, although they are governed by excess noise [2]. Also, silicon PIN photodiodes and linear-mode APDs were used e. g. for plastic optic fiber [3], [4], [5] and free-space [6] applications. Sensitivities of −35.8 dBm, −35.5 dBm, and −32.2 dBm at 500 Mb/s, 1 Gb/s, and 2 Gb/s, respectively, with BER = 10 −9 and with 675 nm light were achieved by a 0.35 μm BiCMOS receiver with an integrated 200 μm-diameter APD and a feedback resistor of 1.5 kΩ [6]. Approaching closer to the quantum limit is possible with single-photon avalanche diodes (SPADs) [7], [8]. A sensitivity of −55.7 dBm was achieved at 50 Mb/s with a 4-SPAD receiver and 635 nm light [8]. Due to dark counts and afterpulsing of SPADs, several SPADs and detection of more than one photon per bit are necessary [8] to reduce the BER below the limit for error correction of 2 × 10 −3 [9]. The dead time has to be larger than several nanoseconds to keep the afterpulsing probability low enough, nevertheless. Therefore, many SPADs are necessary to increase the data rate. In [7], a 64 × 64 SPAD receiver in 0.13 μm CMOS reached a data rate of 400 Mb/s at a sensitivity of Manuscript 9 dBm with 450 nm light. Circuit complexity, chip area, and power consumption of such SPAD receivers is high and certain issues are the photon detection probability, which was about 40% in [7], [8], and their optical fill factor of 43% [7] and 53% [8]. Furthermore, in SPAD arrays, optical crosstalk can cause additional errors [10]. Despite all the effort in these SPAD receivers, gaps to the quantum limit of 12.2 dB [7] and 18.7 dB [8] remained. Table I summarizes the pros and cons of the different types of detectors with respect to application in receivers.
So, the interesting question is whether there is a possibility of ultra-sensitive PIN-photodiode receivers having the three following aspects in mind. First, PIN photodiodes achieve quantum efficiencies from 70% to above 90% and a fill factor of 100%, i. e. only one PIN photodiode is needed. Second, image sensors integrate the photo-charge on each photodiode and they achieve single-photon resolution [11]. But the light-sensitive area of their photodiodes is only about 1 μm 2 [12] and the frame rates are smaller than 1 kframes/s [13]. The light-sensitive area of the photodiodes and the "data rate" would have to be increased. Third, the noise of feedback resistors can be avoided by using photo-charge integration on a feedback capacitor. For approaches 2 and 3, the signal voltage caused by the photocharges on a capacitor has to be sufficiently above the noise of the amplifier. This requires a very small capacitor for charge integration, if SPAD receiver performance is aimed at.
In Section II, we will address the first and second aspect. The third aspect will be dealt with in Section III. Results will be  introduced in Section IV and discussed as well as compared to the state of the art of SPAD receivers in Section V. Section VI concludes the article.

II. LOW-CAPACITANCE PIN-PHOTODIODE
The integrated PIN photodiode uses a circular N++/N-well spot as cathode in the center of the device. The anode consists of a surface P++/P-well ring and the bottom of the P+ substrate. Fig. 1 depicts the 3-dimensional structure of the spot photodiode, which combines a vertical and lateral PIN structure. The Pepitaxial layer is 24 μm thick and boron doped at a resistivity of 1000 Ωcm. The small cathode radius of 2 μm and the thick, low-doped epi layer assure a very low capacitance at a light sensitive area of 707 μm 2 [15].
The dependence of the photo-diode's p/n-junction capacitance on the reverse bias obtained with the device simulator ATLAS [14] is shown in Fig. 2. At 20 V, its capacitance is 1.5 fF and its dark current (leakage current) is less than 1pA [15]. According to the design manual of the process used, a 30 μm long minimum-width metal-3 line and two contact-via posts add about 1.3 fF. The value extracted from the layout for the metal line is 1.5 fF. The total capacitance of the PIN photodiode, including the metal line connecting it to the input transistor, is therefore 3 fF. The responsivity of the spot PIN photodiode at 635 nm is 0.39 A/W corresponding to a quantum efficiency of 0.7634. The transient response measured with a Tektronix TDSC6124C 12 GHz oscilloscope at 675 nm is presented in Fig. 3. The 10%-90% rise and fall times at the reverse voltage of 20 V are 0.73 ns and 0.92 ns, respectively. More details of the photodiode like electric-field distributions are described in [15].
Finally, this section can be concluded. The spot PIN photodiode fulfills the condition of a higher detection efficiency than SPADs and SPAD arrays as well as it provides a much larger light sensitive area than image sensor pixels at a comparable capacitance. Due to a measured −3 dB bandwidth of above 300 MHz [15] of the spot PIN diode, data rates of up to 500 Mb/s should easily be possible with the spot PIN photodiode.

III. RECEIVER CIRCUIT
The photogenerated charges could simply by integrated on the photodiode itself as it is done in 3-transistor active pixel image sensors (see e. g. [16]) or in an integrate&dump receiver as in [17]. However, in the latter case, not only the photodiode's capacitance and the capacitance of the metal line form the integration capacitor. The input capacitance of the amplifier also contributes to the integration capacitor. This prohibits a large signal voltage.
To use a poly-poly or a metal-insulator-metal (MIM) capacitor as integration capacitor, however, is also not a good idea because the smallest available or recommended capacitors in design kits have a value of 10 fF. Instead of developing a smaller customdesigned capacitor, it is most advantageously possible to exploit a "parasitic" capacitor of a MOSFET, the gate-drain overlap capacitance C GD . In such a way, parasitic capacitances of metal connections to a separate capacitor are also avoided. But most interesting is that C GD allows an integration capacitor of 0.33 fF per μm gate width for the 1.8 V NMOS transistor in the 0.18 μm CMOS process used, which actually implements a low-power core and therefore leads to a rather high gain A of about −30 of a common-source amplifier stage. Fig. 4 shows the simplified circuit diagram with M 1 serving as amplifying transistor in common-source (CS) configuration and with its C GD simultaneously as integrator. M 1 is implemented with two minimum width gate fingers and the PMOS transistor M 2 operated as constant current device possesses four minimum Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. The reset transistor M R (see Fig. 4) is activated in each bit for 3.75 ns. Due to its gate-drain and gate-source overlap capacitances, charges are injected at both clock edges. For a minimum-width reset transistor (W = 0.22 μm), we obtain a gate-drain overlap capacitance of 0.0726 fF. With this capacitance, a charge injection of 1.31 × 10 −16 C or 820 electron charges occurs due to clock feedthrough for a 1.8 V clock step. Such a large charge-injection causes higher voltage steps at the input and output nodes of the integrator than the signal voltage from integration of low photo-charge strived for. Therefore, the compensation MOS capacitors M C1 and M C2 are added (inspired by [18], [19]), which must have the half width of M R and which obtain the inverted clock. Because they cannot have half the minimum width W min , the width of M R has to be two times W min . In such a way, the effect of clock feedthrough should be eliminated except for mismatch aspects. Due the fact that the bits will be sampled twice in each bit (before and after  integration) to determine the bit value (double sampling), signal distortion due to mismatch and imperfect reset can be filtered out to a certain extent.
To make best use of the small integration capacitor used in Fig. 4, a high gain A of the common-source amplifier is advantageous, since the integration efficiency η int (the photocharge is not transferred completely to C int ) depends on the input-node capacitance C T (C PD +C ML +C in ; C PD photodiode capacitance, C ML capacitance of metal connection, C in input capacitance of the amplifier), C int and the gain A: The integration efficiency is 0.65 for A = −30, C int = 0.356 fF (inclusive extracted parasitics, see below), and C T = 5.4 fF. Therefore, using the integrator with the feedback capacitor leads to a much higher signal voltage than integrating on the "photodiode" (better: on the input node capacitance without C in ) directly. The improvement of the signal voltage is η int . C PD /C int = 5.5 for η int = 0.65, C PD = 3 fF and C int = 0.356 fF.
The properties of the receiver and especially the integration efficiency strongly depend on the layout and on the parasitics extracted from the layout. The optimum layout was determined out of several versions. The extraction added finally an additional capacitance of 0.211 fF in parallel to the integration capacitor C GD .
The signal voltage v S at the output of the integrator is: with Q ph = n ph . q where n ph is the number of photons incident on the photodiode, q is the electron charge and η d is the quantum efficiency of the photodiode.
Furthermore, we have to consider the noise of the receiver. The noise voltage v n at the output of the integrator was determined by circuit simulation. Thermal channel noise (3.95 mV) and 1/f-noise (10.9 mV) contribute, both integrated from 1 Hz to the noise equivalent bandwidth 120 MHz. In addition, there is reset noise (kT/C noise, k is Boltzmann's constant and T is absolute temperature) due to the thermal noise of the reset transistor and its unknown actual noise voltage at the moment of switching the reset transistor off [20]. For C GD = 0.145 fF, we obtain a reset noise of kT C GD = 5.3 mV and even for considering the extracted capacitance of 0.211 fF in addition, the reset noise is still 3.4 mV. Double sampling eliminates kT/C-noise [21]. It however, increases the thermal noise voltage of the integrator by the square root of 2, but it reduces the 1/f-noise considerably [22], [23]. Here, double sampling is even more important because of the presence of the imperfect compensation of injection charge in the investigated receiver as will become clear in Section IV.
Another noise aspect is photon noise [24] which is characterized by the standard deviation √ n ph for the average number of photons n ph . This standard deviation leads to an additional noise voltage at the integrator output due to different numbers of photons being present in different bits.
In sum, we obtain the total mean square noise voltage v 2 n,t = 2v 2 n + v 2 ph for a "1"-bit, where the factor 2 comes from double sampling and v n,ph = √ n ph η d η int q/C int stands for the photon noise transformed into the corresponding uncertainty of the signal voltage at the output of the integrator.
With the signal voltage and the noise voltages, now, we can estimate the bit error ratio (BER) via the so-called Q-factor (a measure for the signal-to-noise ratio). The Q-factor for BER = 2 × 10 −3 is 3. For BER = 10 −9 , Q = 6 is necessary. The Q-factor can be obtained from [25], [1]: where <v s,1 > and <v s,0 > are the mean values of the signal voltage for a logical "1" and logical "0", respectively. The total noise voltages of a logical "1" and of a logical "0" are v n,t, 1 and v n,t,0 , respectively. In our case the signal voltage of "0" is approximately zero, because the extinction ratio of the light source is very high (>200 [8]), and the photon noise for the "0" is negligible. The noise voltages at the integrator output for logical "1" and "0" are equal (v n,1 = v n,0 = v n ) because the operating point should shift only in the mV range. This leads to as an approximation for the Q-factor, where the factors 2 are valid for double sampling. Unfortunately, the operating point of the integrator depends on previous bits due to the reset problem (see below). Therefore, the gain of the amplifier varies and the measured performance cannot verify the behavior expected from above model.
Directly after transistor M R is released for a new integration cycle, an incomplete reset or charge injection may cause an error voltage V NO or V NI at the output node OUT or at the input node IN, respectively. If these error voltages remained constant during a measurement cycle, correlated double sampling would cancel it out completely. Unfortunately, an exponential settling to a final value with a time constant τ must be considered, which may cause an error in the difference of both sample values. To calculate the dynamic voltage behavior and as a consequence the time constant τ , Kirchhoff's current law is applied to nodes IN and OUT (see (5) and (6)).
In (5) and (6), V IN and V OUT are the (error) voltages at IN and OUT, respectively, referred to the operating point. C L is the overall load capacitance at node OUT, g mn is the transconductance of transistor M 1 and g 0 is the summed output conductance of transistors M 1 and M 2 . I PD is the photocurrent flowing into the cathode of the photodiode. To calculate V IN and V OUT , (5) and (6) have to be separated into one equation for V OUT (7) and one for V IN (8).
We only solved (7) and (8) for those cases, which are appropriate for our application. However due to linearity, a superposition of different solutions for V OUT (t) and V IN (t) is again a solution.  (9) and (10).
r An error voltage V OUT (0) = V NO is caused by e.g., charge injection to the output node OUT. Setting I PD = 0 results to (11) and (12). (12) r Equations (13) to (15) show the result for a constant pho- All solutions above consist of parts with exponential settling, which is described with the time constant τ in (16). (16) When choosing the sample time points for measurements, it is important to consider a settling time of at least about 4τ to cancel out error voltages effectively by correlated double sampling to get a reasonable result. Error voltages might occur directly after reset switch M R is released and are caused by charge injection, uncompleted reset or noise. In our design, τ was calculated to 298 ps for the ideal operating point. An uncompleted reset additionally causes the operating point to shift from detection cycle to detection cycle, because the error voltage after settling depends on the previous bit. A series of same bits might introduce a remarkable voltage shift. As a consequence, the integrator leaves the region of the original small-signal regime at the ideal operating point and new small-signal parameters are valid.
The chip photo of the fabricated receiver is presented in Figs. 7 and 8 depicts the core's layout. The gain of the differential amplifier and of the output driver (see Fig. 4) is 11.0 according to circuit simulations. The power consumption of the receiver is 24.15 mW inclusive output driver, which dissipates 21.6 mW. The CS-amplifier has a current of only 13.7 μA and the dummy inverter draws twice this current. The active area of the core circuit (without post amplifier and output driver) is 0.01452 mm 2 .

IV. RESULTS
The PRBS (2 7 -1) return-to-zero (RZ) signal modulating the laser was generated with a Sympuls BPG 40G-128M bit pattern generator which allowed us to control the length of the laser pulses inside a logic "1". The generator is programmable and by generating the complete PRBS signal as single bitstream with 128 (sub-)bits per data-bit in MATLAB, a laser pulse with the length of 26.5% of a bit, i. e. of 5.3 ns duration, was realized. The laser with a wavelength of 635 nm [8] was externally modulated using this signal from the pattern generator.
The 50-Mbit clock was generated by a Centillax TG1C1-A clock synthesizer and fed into the BPG 40G-128M and into a second bit pattern generator BMG 12GIG, where the clock period was further divided into 64 time slots, which allowed the definition of the length and position of the reset signal inside a single data-bit by setting time-slot wise "0" or "1".
Using a Keysight mixed signal oscilloscope MSOV204A, the output stream of the receiver and the reset signal were stored. With a sampling rate of 20 GS/s and a bandwidth of 7 GHz more than 900000 bits were stored. Due to the huge amount of data, both output signals were measured, but only the differential output signal of the receiver, calculated (v out1 -v out2 , see Fig. 4) inside the oscilloscope was stored.
In MATLAB, the stored bitstreams were compared to the pseudo-random-bitstream with the PRBS length of 2 7 -1 sent to the laser. Due to signal propagation time in electrical cables and in the optical fiber a significant delay between signal generation and output signal of the receiver occurred, which was synchronized during the evaluation in MATLAB. Using the reset signal with the start at t = 0, the first (t s1 ) and second (t s2 ) sampling points were optimized for lowest BER. The first sampling point had to be sufficiently after the end of the reset and before the laser pulse (in case of a logic "1"). The second sampling point had to be after the laser pulse and before the reset for the next bit. Again, due to signal propagation time, there is a delay between the measured reset time and the output signal, which gives an offset in the sampling times. In dependence on whether the signal voltage Δ = v(t s2 )-v(t s1 ) was above or below a threshold value v th it was decided whether   9. Bit sequence as measured differential output voltage. a logical "1" or "0" was detected. The BER was optimised in Matlab by varying t s1 and t s2 , as well as v th . The lowest BER was achieved with t s1 = 4.8 ns and t s2 = 14.5 ns.
The blue vertical lines in Fig. 9 represent the sampling instance after reset and before integration. The green vertical lines represent the sampling instance after the integration. The DC operating point unfortunately depends on the last bits. This indicates that the reset transistor is not strong enough, it seems to need a larger width. The integrated photo-charge causes a voltage change of up to about 170 mV. The dependence of the operating point on the reset and on previous bits, causes negative voltage spikes with a height of up to about 300 mV. The compensation with the MOS capacitors obviously does not work perfectly. Fig. 9 also shows that due to the non-perfect reset and (in turn) moving operating point, the postamplifier and output driver saturate for many of the negative spikes (especially around 200 ns in Fig. 9), since many negative charge-injection spikes are cut, respectively compressed.  Fig. 10 plots the obtained BER values in dependence on the optical input power. Since the BER for the average optical power −56.4dBm is very close to 2.0 × 10 −3 (actually somewhat better) we take this power value as sensitivity for BER = 2.0 × 10 −3 . This sensitivity corresponds to an average optical power of 2.3 nW, which means 147 photons at 635 nm within each bit. But half of the bits are "0s" and the other half are "1s", which leads to 294 photons in a "1"-bit. If a forward error correction method as e. g. RS(255239)/CSOC (n0/k0 = 7/6, J = 8) or a super FEC code is used, the BER can even be 6.5 × 10 −3 to obtain an output BER of 10 −9 [26]. Then, the sensitivity of the investigated receiver is somewhat better than −57.0 dBm.

V. DISCUSSION AND COMPARISON
The clock feedthrough visible in Fig. 9 can be explained by mismatch between the reset transistor and the compensation MOSCAP M C1 (and M C2 ). We also have to be aware that the reset transistor has two times the minimum gate width, which causes twice the number of injected electrons estimated above, i. e. 1640 electrons. The obtained sensitivity can be translated to 294 photons in a "1", which correspond to 223 electrons considering the quantum efficiency η d of 0.7634 for R = 0.39 A/W. The ratio of injected electrons (not considering M C1 and M C2 ) and "signal" electrons would be about 7.4, whereas the observed clock injection is about two times the signal voltage. M C1 and M C2 for compensation of clock feedthrough partially help. However, the two MOSCAPs seem to mismatch with the reset transistor. Another possibility to explain the observed charge injection would be parasitic capacitances in the reset and compensation branch, which were not found in the extraction procedure. In addition, Fig. 9 shows that the operating point moves depending on the bits received before. This explains also the different heights of the integration swing and of the voltage dips due to clock feedthrough.
Despite all imperfections, the achieved sensitivity (−56.4 dBm) is quite good and even a little bit better than the sensitivity of a 4-SPAD receiver with a sensitivity of −55.7 dBm at 50 Mb/s [8]. Fig. 11 compares the sensitivity of the described receiver to the state of the art.
The power consumption of the 4-SPAD receiver is 19.1 mW (380 pJ/bit) and the active area is 0.1 mm 2 (both without output buffers) [8].
The PIN-photodiode receiver of this work leaves a gap of 18 dB to the quantum limit [8] and with this value it is also better than the 4-SPAD receiver of [29] with a sensitivity of −46.3 dBm at 100 Mb/s. The one-SPAD receiver of [28] using a gating principle and sub-bits achieves a gap to the quantum limit of 15.7 dB at 50 Mb/s and is only 2.3 dB better than the PIN-photodiode receiver of this work. The power consumption is 41 mW at (50 Mb/s, without 50Ω output driver) and the active area is 0.66 mm 2 [28]. The 64 × 64 SPAD receiver in 0.13 μm CMOS reached a data rate of 400 Mb/s at a sensitivity of −49.9 dBm with 450 nm light, however at a high circuit complexity and a high power consumption of 230 pJ/bit (including ADC functionality used for 4-PAM) [7]. The active area of the receiver of [7] is 1.8 mm 2 . The power consumption of the PIN-photodiode receiver of this work is 51 pJ/bit and its active area is 0.0145 mm 2 (both without output driver). Another SPADbased receiver approach used a Silicon PhotoMultiplier (SiPM) containing 5676 passively quenched SPADs and achieved a sensitivity of −58.4 dBm at 400 Mb/s and −49 dBm at 1 Gb/s with 405 nm light and BER = 10 −3 [30]. However, the SiPM is fabricated in a special technology and cannot be integrated in CMOS chips. Table II shows the overview of the state-of-the art and compares the performance of the suggested receiver.
For comparison, the linear-mode APD receiver of [6] in 0.35 μm BiCMOS technology achieved sensitivities of −35.8 dBm, −35.5 dBm, and −32.2 dBm at 500 Mb/s, 1 Gb/s, and 2 Gb/s,  II  COMPARISON STATE-OF-THE-ART respectively, with BER = 10 −9 and with 675 nm light. At 500 Mb/s the gap to the quantum limit (for BER = 10 −9 ) of this APD receiver is about 20 dB. In this context, the suggested PIN-photodiode receiver comes closer to the quantum limit than linear-mode APD receivers.

VI. CONCLUSION
The first ultra-sensitive PIN-photodiode receiver overcomes linear-mode APD and some SPAD receivers in sensitivity (compared using the corresponding gaps to the quantum limit) without exploiting any impact ionization. The potential of the capacitivefeedback approach, however, seems not to be exhausted, yet. The low power consumption and the small active area of the receiver core make the suggested integrating receiver attractive compared to SPAD receivers.
The reset and compensation of injection charges has to be improved to achieve the robustness and yield needed for practical applications and an automatic mechanism for tuning the operating point of the integrator is highly recommended. Furthermore, the double sampling and decision circuity has to be integrated on the ultra-sensitive PIN-photodiode receiver chip to obtain product eligibility.
The parasitics extraction from the layout added 1.5 times the capacitance of the integration capacitor. So, the actual integration capacitor had 2.5 times the ideally possible value of the gate-drain overlap capacitance. To exploit the potential of the basic idea fully, it will be crucial to find ways for reducing the effect of parasitic capacitances from the layout. Nevertheless, the effort for optimizing the layout will be high, because the actual performance after layout differs strongly from that of pre-layout simulations.