Overcoming the Transimpedance Limit: A Tutorial on Design of Low-Noise TIA

Noise probably the single most important performance metric of the high-speed transimpedance amplifier (TIA), which directly sets the sensitivity of optical receiver. The transimpedance limit which dictates the maximum achievable transimpedance gain of the TIA also turns out to fundamentally limit the TIA noise performance. In this tutorial, we analyze and explore two circuit design approaches to overcome the transimpedance limit. The first approach (Type I) realizes a divide-and-conquer methodology to separate the noise-bandwidth problem and solve them individually. The second approach (Type II) employs a multi-stage stagger-tuned amplifier. Both approaches can overcome the transimpedance limit, forming an effective toolkit for the design of low-noise high-speed TIA for high-sensitivity CMOS optical receivers in current and future applications.

performance. This may be tolerated in some short-distance scenarios like small to mid-scale data centers that span less than a hundred meters. However, in the long-distance scenario that requires higher sensitivities like 5G, Passive Optical Network (PON), and mega data centers, SiGe optical front-ends are generally preferred thanks to their superior noise performance. This is not only uneconomical as optical communication expands to wider application space, but also adds complexity, volume, and cost in system integration since backend electronics are usually CMOS based. Furthermore, fully integrated approaches are not only preferred but also mandatory in the near future to fulfill the extreme bandwidth density and power requirements such as in the co-packaged optics (CPO) application [4]- [5]. The dilemma between performance insufficiency and application trends motivates people to investigate CMOS based optical communication front-end circuits.
For quite a while, shunt-feedback (SF) 1 is the predominant topology for TIA realization, which generally offers low noise and high gain for data rate up to a few Gb/s [6]. However, due to the square-law degradation of the transimpedance gain governed by the transimpedance limit [6]- [8], SF-TIA becomes less effective, especially in advanced CMOS technology, which tends to be more hostile to host analog circuits [9]. As a result, alternative noise unfriendly TIA topologies like Common-Gate (CG) TIA [7], [10], Regulated-CG TIA [11], and even simple resistor-based TIA [12] have been adopted more often, due to their capability to deliver high bandwidth.
To make a low-noise CMOS optical front-end using the SF-TIA topology, one of the biggest challenges tends to come from the transimpedance limit [6]- [9], which dictates the maximum achievable transimpedance gain, and more importantly the achievable noise performance. In this brief, we analyze and explore approaches to overcome the transimpedance limit with the underlying goal to make low-noise high-speed TIA.
The rest of this tutorial is organized as follows. Section II analyzes the transimpedance limit on SF-TIA at the conceptual level and implementation issues in CMOS versus SiGe technology. Section III and IV examine the Type I and II approaches to overcome the transimpedance limit, as well as their design examples and consideration. Comparison and design guidelines of the two approaches are discussed in Section V. Finally, Section VI concludes this brief. 1 More precisely as shunt-shunt feedback.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

II. TRANSIMPEDANCE LIMIT IN SF-TIA
In this section, we review and analyze the transimpedance limit on SF-TIA. We will see why it not only limits achievable transimpedance gain but more importantly, sets the tone for the noise performance. Further, the transimpedance limit in the actual technology context is also examined, showing why the baseline limit is difficult to achieve in CMOS technology.

A. Review of Transimpedance Limit
The topology of SF-TIA is shown in Fig. 1(a). C T represents the total capacitance at the TIA input and R F is the feedback resistor. The gain and pole of the forward amplifier within the feedback loop are −A and f A , respectively. In case the TIA speed is relatively low with respect to the technology of design, f A is often neglected. The TIA is therefore considered as a first-order system, with the transimpedance gain, bandwidth, and noise given by From (1)-(3), it is clear that the feedback resistor R F is at the center of the trade-off among transimpedance gain, bandwidth, and noise in SF-TIA. We would like a larger transimpedance gain to reach low noise, which is at odds with high bandwidth.
As the TIA speed keeps going higher, f A cannot be neglected anymore, and the TIA is better characterized as a secondorder system. In this case, the input pole 1/(2π R F C T ) is dominant, and f A is non-dominant, as shown in Fig. 1(b). As f A moves closer to the open-loop gain-bandwidth product A/(2π R F C T ), the second-order TIA shows [7], [13]: (a) critical damped response with 2 real poles with quality factor Q = 0.5; (b) maximally flat group delay response, also known as Bessel response, with 2 conjugate complex poles' Q = 1/ √ 3; (c) maximally flat amplitude response, also known as Butterworth response, with 2 conjugate complex poles' Q = 1/ √ 2. Continuing this trend, frequency domain peaking kicks in and eventually the system becomes unstable.
In general, the Butterworth response is preferred to maximize the transimpedance gain while incurring minor time-domain ringing and jitter. Under this condition, the transimpedance limit [6]- [9] 2 is given by where the amplifier gain-bandwidth product A × f A roughly equals the technology transit frequency f t . Given a certain technology set of electronics and photonics, R F actually trades with the square of the TIA bandwidth as opposed to (2), predicting a rapid drop of transimpedance gain as well as a quick rise of noise when reaching a higher data rate. Though the gain can be compensated by cascading more stages after the TIA, the noise performance is fundamentally set by the TIA, which only deteriorates as the signal transits to the latter stages. Therefore, in view of the critical role of R F in the noise performance of the TIA from (3), the transimpedance limit can be also viewed as a kind of noise limit. This motivates us to find ways to overcome the transimpedance limit.

B. CMOS vs. SiGe in SF-TIA
If we take into account the actual technology capability rather than the abstract f t simplification in the previous analysis, the situation for CMOS is even worse. Fig. 2 shows the typical SF-TIA realizations in CMOS and SiGe, the two main technology platforms nowadays. We have the following observations.
Biasing limitation: The CMOS-inverter-based TIA has become popular to realize CMOS TIA thanks to its current reuse for more transconductance, the ability to achieve intrinsic gain at low VDD (∼1 V) in advanced CMOS technology, and its ease of design. However, even with a simple CMOS inverter configuration, there are not enough voltage headroom to bias both transistors at their peak f t , which requires an overdrive voltage of more than 400 mV. With a threshold voltage of around 300 mV for advanced CMOS, the effective f t only reaches a fraction of the peak f t due to the low VDD limitation. On the other hand, SiGe TIA usually has a much larger VDD (∼ 3V), allowing to bias the transistor at its near-peak f t .
Gain limitation: To actually reach the maximum limit in (4), a high gain A (and a corresponding f A ) should also be satisfied [9]. For SiGe BJT, the transistor's intrinsic gain does not vary much against the bias current density. However, for CMOS, at high-f t biasing with large bias current density, the transistor's intrinsic gain shrinks into the much smaller single-digit region. Further, the low VDD makes it difficult to exercise sophisticated gain boosting techniques. As a result, it is often difficult for CMOS TIA to reach the transimpedance limit in high-speed (high current bias) scenarios due to the lack of amplifier gain.
Miller capacitance: In SiGe TIA, thanks to the high VDD, the cascode transistor can be easily applied, which shields the Miller multiplication capacitance. For CMOS TIA, due to the low VDD, the cascode transistor can be difficult to squeeze in, leaving a large amount of Miller capacitance presented in the TIA input, reducing the value of R F .
Buffer: Unlike SiGe, the low VDD in advanced CMOS technology makes the use of source follower as buffer very difficult, and the capacitance from the post-TIA stage lowers the amplifier pole f A and consequently the transimpedance gain according to (4). Furthermore, without a buffer, the amplifier's output resistance cannot be neglected anymore, and the input resistance of the TIA becomes where the output resistance R OUT of the amplifier shows its presence, and the actual R F needs to become lower for bandwidth. On the other hand, in SiGe, the high VDD allows the use of a good buffer built with the emitter follower, which keeps f A at high frequency and minimizes the amplifier output resistance.
Parasitic capacitance: Since the footprint of the SiGe transistor is generally much smaller than its CMOS counterpart for the same transconductance (g m ) value, the impact of parasitic capacitance in CMOS is much more severe than in SiGe, degrading the effective f t .
Therefore, CMOS-based TIAs are far more incompetent compared with those built in SiGe to release the full potential of technology due to the described limitations. Combining these hostile characteristics of CMOS for SF-TIA realization, it is estimated that CMOS SF-TIA can only achieve less than 1/2 ∼ 1/3 of the transimpedance gain versus SiGe realization for the same technology f t , signifying a rather dismal picture of noise performance. The analysis is also echoed by the fact that most commercial high-speed TIAs are realized in SiGe instead of CMOS. Circuit approaches are highly desired to overcome the transimpedance limit of CMOS SF-TIA.

III. TYPE I APPROACH: BREAK TRANSIMPEDANCE LIMIT
The first approach, which we categorize as Type I, aims to circumvent the transimpedance limit. The conundrum of reaching noise (set by transimpedance gain) and bandwidth targets at the same time motivates us to solve them each at a time. Indeed, we can separate the noise and bandwidth goals and deal with them sequentially to break the transimpedance limit. In view that noise is usually much more difficult to solve while bandwidth compensation can be conveniently carried out, a noise-first-bandwidth-second two-step approach has been proposed [9], [14].
With this notion in mind, the dilemma of the R F ∝ 1/BW 2 relation in the transimpedance limit equation becomes the Type I approach: (a) topology and (b) its realization in 65-nm CMOS [9], [14].
solution. If we can relax the bandwidth requirement for a moment and solve it later, R F can be boosted in a squarelaw manner against the bandwidth shrinkage, until the overall noise floor is reduced substantially. Once the noise problem is effectively tackled, bandwidth can be compensated relatively at ease, thanks to the rapid development of bandwidth extension techniques in the recent decades [13]- [16].
In Fig. 4(a), following the methodology of the Type I approach, instead of using a single-stage TIA, the two-stage front-end (TSFE) [9] was proposed, where the first stage focuses on noise and the second stage deals with bandwidth. The first TIA stage scales down its bandwidth by a factor of n, enabling a maximum n 2 times boost of the feedback resistance based on the transimpedance limit. The factor n is set on the basis that its resulted feedback resistance n 2 R F should be large enough to set a low noise floor. The second equalizer (EQ) stage then recovers the bandwidth by strengthening the highfrequency contents. While the idea of the Type I approach can be explained straightforwardly, two key concerns need to be addressed. First, at the conceptual level, does bandwidth compensation raise noise? Second, at the implementation level, to what extent should the bandwidth scaling be carried out?
First, let us examine the overall effect on noise. Although the EQ raises the high-frequency noise content, it is done after the low-bandwidth TIA stage where noise has been low-pass filtered earlier. Consequently, their net outcome results in the same high-frequency noise characteristics. Importantly, in such a procedure, the white noise is effectively reduced because of the boosted feedback resistor. More detailed analysis and proof can be found in [9].
Next let us think about the realization aspect. Although an arbitrarily large n factor can reduce the white noise essentially to zero, it deserves to verify how much n is needed to effectively suppress the white noise while being technically achievable. As shown in Fig. 3(a), as the bandwidth of the TIA stage scales down, the requirement for the amplifier within the feedback also changes, mandating a steady increase of the amplifier gain A. As has been analyzed in Section II-B, high stage gain in advanced CMOS could be difficult to obtain due to high-speed biasing and limited VDD, which limits the achievable n factor. On the other hand, the high-frequency boost capability of the EQ may also limit the n factor. In general, the peaking magnitude is inversely proportional to the peaking bandwidth. Therefore, if the TIA bandwidth is set too small, it will be difficult for the EQ to compensate the bandwidth smoothly, leading to undesired time-domain jitter and overshoot. To make effective broadband peaking, several EQ stages may be needed [17], increasing power and complexity.
Until now, the analysis so far assumes the TIA scaling follows the transimpedance limit, where the core amplifier should be re-configured synchronously as indicated in Fig. 3(a). When it is difficult to re-configure the core amplifier as desired, the R F boost factor can be less than n 2 in the previous scenario. In the extreme case, it equals the TIA bandwidth shrinkage factor n if the core amplifier is fixed. In this fixed amplifier scenario, the Q factor of the pole in the TIA transfer function decreases as the bandwidth scaling is exercised and the initial roll-off is slowed. This on the other hand will ease the design of the EQ to form an overall flat transfer function. Therefore, depending on the reconfigurability of the core amplifier and the desired shape of the transfer function, the R F boost factor is between n and n 2 .
From the analysis above, co-design and simulation are essential from both the TIA and EQ to set proper n and Q to achieve the desired noise level as well as the shape of the overall transfer function. Fig. 4(b) provides a possible implementation to realize 25 Gb/s TSFE using 65-nm CMOS under 1-V supply voltage [9], [14]. The TIA stage utilizes the popular CMOS inverter as its amplifier where the n factor of the TIA stage is set to be around 2, with an NMOS cascode to boost the amplifier gain needed for larger R F . In the second EQ stage, inductive shunt peaking is used to form the high-frequency boost. The bond wire inductor L series forms a series peaking at the edge of the bandwidth to suppress highfrequency noise. Both R F and R D are designed tunable to better accommodate PVT variation. Paired with a III-V photodiode and using the pseudo-differential version of the front-end, it achieves an input-referred noise current of 1.8 µArms under 13.6 GHz electrical bandwidth, 3 which is comparable to its SiGe counterparts.
In a similar implementation [18], paired with a silicon photonic Ge photodiode, the TSFE achieves an input-referred noise current of 0.91 µArms under 16.1 GHz electrical bandwidth, which is even better than its SiGe counterparts thanks to the small capacitance from the photodiode, as well as the 3D integrated co-design approach. Since its first introduction in [9], [14], the Type-I approach has been widely used, where the EQ realization can be either in analog domain [19]- [22], or in digital domain [23]- [27].

IV. TYPE II APPROACH: EXCEED TRANSIMPEDANCE LIMIT
The transimpedance limit from (4) is based on the assumption that the core amplifier has only one stage. Since the cascaded multi-stage amplifier has a larger gain-bandwidth product [3], we could also seek ways to build a stronger amplifier to exceed the one-stage-based transimpedance limit, which we categorize as the Type II approach.
Although this idea is straightforward, it deserves careful evaluation. Critically, the stability of the closed-loop system becomes a major concern. A typical multi-stage high-speed broadband amplifier is often built on identical or similar gain stages, generating several non-dominant poles in the TIA loop. Therefore, the pole of each gain stage should be placed high Fig. 4. Type II approach: (a) topology and (b) its realization in 180nm-CMOS [28], [30]. enough so that their overall phase shift in the open-loop UGB (Unity Gain Bandwidth) frequency is small to retain the loop phase margin. From a bandwidth point of view, this means each stage should be able to provide a fairly large bandwidth, leaving little gain with each stage and diminishing the cascading effect to boost total gain. As a matter of fact, to make a multi-stage amplifier effective in this case, an f t /BW ratio of 10 should be satisfied [7], [8] for a 3-stage TIA design.
To put this analysis in perspective, consider how much f t is needed to make 10-GBaud, 25-GBaud, 50-GBaud, and 100-GBaud 3-stage TIAs, where the 3-stage is the minimum in multi-stage configuration to form single-ended TIA with negative feedback. The calculated minimum technology f t for each baud rate is shown in Table I. The baseline technology is the one that just satisfies the minimum f t requirement. Considering the several drawbacks of CMOS technology discussed in Section II-B, the effective technology should be upgraded at least to a more advanced technology node. As a result, at 10 GBaud and 25 GBaud, 90-nm and 28-nm CMOS technology should be used when it begins to gain any benefit in 3-stage TIA. At 50 GBaud and beyond, there are simply no CMOS technologies available to satisfy the requirements. The simple exercise shows that in order to make 3-stage TIA with identical gain stages effective, a rather good technology should be used, which is either expensive or simply unavailable yet. This finding is in line with the suggestion that most high-speed TIA is likely better suited with single-stage amplifiers [7], [8].
While cascading identical gain stages can be difficult in building TIA, a possible way is to use a stagger-tuned multistage amplifier with the help of inductive peaking, which is proposed in [28]- [30]. The conceptual topology of the Type II approach is given in Fig. 4(a). The first stage (A1) is designed to purposefully scale down its bandwidth to a fraction of what is required, trading its bandwidth for a larger gain. The following stages (A2 and A3) act as a composite EQ, which recovers the open-loop bandwidth to maintain the loop stability and the shape of the transfer function of the closed-loop response. The two-stage EQ also embraces the advantage to form a wideband gain boost and therefore makes it easier to form a flat open-loop transfer function. Fig. 4(b) provides the corresponding 10-Gb/s TIA implementation example in a 180-nm CMOS technology [28], [30]. A1 is a low-bandwidth stage with larger gain, enabled by the active load which relaxes the voltage headroom issue. A highgain CMOS inverter stage is not used in A1 since its bandwidth is too low to recover. A2 and A3 both implement inductive peaked EQ, where their center peaking frequencies are staggered to form the wideband peaking profile needed. Overall, the 3-stage stagger-tuned amplifier achieves more than 2X the total gain against its counterpart using identical gain stages with similar total bandwidth, where the latter even employs moderate inductive peaking for bandwidth extension. As a result, the stagger-tuned 3-stage TIA achieves 17X the transimpedance gain versus a single-stage TIA. Besides the much larger gain-bandwidth product in the stagger tuned amplifier, the relatively small output resistance in the third stage also makes it less prone to the output resistance effect described in (5), which tends to heavily limit 1-stage TIA. More detailed analysis and the result can be found in [30].
Except for the advantage in gain-bandwidth product, another important merit of the Type II approach is the additional noise reduction effect [30]. In a large feedback resistance condition enabled by the high amplifier gain, noise is then dominated by the amplifier. The inductive peaked stagger-tuned amplifier also turns out to show low noise. The active load allows the front-end transistor to be biased with a large g m to reduce noise. Meanwhile, the high gain formed in A1 stage further suppresses noise from the rest of the amplifier. The overall TIA input-referred noise power reduction is almost 2X from the 3-stage amplifier with an identical gain stage. Last but not least, the 6-order transfer function formed in this approach can effectively filter out high-frequency noise, which tends to be quite large even at out-of-band frequencies.
The f t in this technology is around 50 GHz, which does not favor a multi-stage approach to make 10-Gb/s TIA. Using the Type II approach, not only a high transimpedance gain is achieved, but also an excellent sub-microampere noise performance is also accomplished, which is even on par with some of the SiGe realizations. This shows the effectiveness of the Type II approach to realize low-noise TIA.

V. DESIGN GUIDELINES
So far, we have introduced and explored the Type I and Type II approaches to tackle the transimpedance limit for low noise. Since both approaches serve the same goal, which one should we use? In this section, we briefly compare the two approaches and provide some design guidelines and insights.
Similarity: Both approaches employ the low-pass stage plus equalizer stage topology, where the Type I approach applies it out of the TIA loop and the Type II approach uses it within the TIA loop. In either case, inductive peaking-based equalization is used in the examples, thanks to the rapid development of on-chip inductors and their wide availability.
Noise performance: The Type I approach only addresses the white noise while the Type II approach tackles both white noise and colored noise from the amplifier. Therefore, the Type II approach may be used in noise limiting scenarios while the Type I approach can be used for a moderate amount of noise reduction. In both cases, simulation and iteration should be carried out to select the optimum device parameters.
Time-domain performance: The Type I and Type II approaches form 4 th and 6 th order systems, respectively. In an ideal condition where the bandwidth compensation within each approach is properly carried out, the eye quality in the Type I approach should be better than the one in Type II. This is because a high-order system with multiple pairs of complex poles tends to present more time-domain jitter and overshoot, degrading the eye quality. However, the results may vary and are tightly determined by the actual implementation.
Design complexity: Both approaches are not quite difficult to realize, although the Type II approach is slightly more complex. First, the EQ has two stages with distinct center peaking frequencies, and the compensation is within the loop. Second, the system is 6 th order as opposed to that of 4 th order in the Type I approach. In either approach, extensive design iterations are necessary.
Applicability. Recall Table I and the development of the Type II approach from the multi-stage TIA, this approach, therefore, is more effective when the technology speed is sufficiently large than the required bandwidth. As a rule of thumb, the technology f t should be 5∼10 times the bandwidth target. On the other hand, the Type I approach does not have this limitation and can be applied irrespective of the f t /BW ratio, suggesting wider applicability.
In sum, the Type II approach, while being more effective in noise reduction, is relatively more difficult to realize, and is more effective in a relatively low-speed application. The Type I approach, although only addresses one type of noise, is easier to implement, and can be applied to a wider range of applications. However, the conclusion is just a first-order qualitative simplification, and it always deserves careful design and simulation for a final call.

VI. CONCLUSION
The drive to use CMOS to realize high-speed optical frontend circuits in favor of the superior economics, bandwidth density, and integration capability against its inferior analog performance poses a dilemma, which at its core tends to be the transimpedance limit. This brief provides an effective toolkit comprised of two approaches to solve this problem. The Type I approach embodies a divide-and-conquer methodology to separate the noise-bandwidth problem and solve them individually to break the transimpedance limit. The Type II approach seeks to employ a sophisticated multi-stage stagger-tuned amplifier to exceed the transimpedance limit. Both approaches can overcome the transimpedance limit with the goal of lower noise. In addition, the comparison between the two approaches is also performed, forming a set of design guidelines for designers to adopt the approach that best accommodates their applications.