A 3D Integrated Energy-Efficient Transceiver Realized by Direct Bond Interconnect of Co-Designed 12 nm FinFET and Silicon Photonic Integrated Circuits

This article presents the first experimental demonstration of an energy-efficient electronic-photonic co-designed transceiver circuit heterogeneously 3D co-integrated with high-density, low-parasitic Direct Bond Interconnect (DBI) featuring 32-channel microdisk modulator/filter based optical transceivers for Wavelength Division Multiplexing (WDM) scheme. The silicon photonic chip is fabricated in AIM Photonics' integrated photonic technology, and the optical transceiver chip is fabricated in GlobalFoundries 12 nm FinFET process. The optical transmitter consumes 2.823 mW at 18 Gb/s, with 1.2 Vppd electrical modulation differential swing, and achieves an extinction ratio of 7 dB. The optical receiver utilized quarter-rate sampling via Injection-Locked Oscillator (ILO) and forward clocking architecture, consumes 6.11 mW, and achieves the Optical Modulation Amplitude (OMA) sensitivity of −20.3 dBm at 12 Gb/s under the photodiode responsivity of 0.8 A/W. The receiver can further operate at 25 Gb/s with a sensitivity of −17.01 dBm and 191 fJ/bit. The transceiver pair at 18 Gb/s achieves 496 fJ/bit link efficiency.


I. INTRODUCTION
I NCREASING demand for energy-efficient data transmis- sion in future high-performance computing (HPC) systems and data centers results in power, bandwidth, and scalability bottlenecks [1].Silicon photonic (SiPh) transceivers have the potential to address this, provided that the photonic integrated circuit (PIC) and electronic integrated circuit (EIC) are optimally co-designed and co-integrated with minimal parasitics [2].In particular, minimizing parasitic capacitance at the receiver input node can dramatically improve receiver Optical Modulation Amplitude (OMA) sensitivity due to the direct correlation of input noise of front-end circuitry with the input capacitance.Minimizing parasitic capacitance at the transmitter output node can improve bandwidth and energy efficiency due to the reduced capacitive load of the output driver [3].While monolithic integration of electronic-photonic circuits [4] can minimize parasitic overheads, this compromises the flexibility and optimization of the performance of both the EIC and PIC.Heterogeneous integration allows for independent EIC and PIC optimization with low parasitics.
While recent work has shown 3D integrated EIC-PIC assemblies [5], this was without integrated operational electronic circuitry and required an external high-speed data I/O interface to the photonic circuits.This article, first presented at the 2023 Optical Fiber Communication Conference (OFC) [6], utilizes direct bond interconnect (DBI), which is a state-of-the-art 3D integration solution by merging the top metal and dielectric of two wafer/die [7][8] that offers bond pitch as small as 2 μm and is the first demonstration of a 3D DBI integrated, co-designed silicon photonic transceiver with a 12 nm FinFET EIC and SiPh PIC.
Direct Bond Interconnect (DBI) technology [9] is a lowtemperature hybrid bonding process that forms a dielectric-todielectric bond at room temperature and a metal-to-metal bond at the appropriately designed temperature below 400 • C. It is the key enabling technology for advanced products because of its unique ability to bond wafers at low temperatures and to successfully bond pads ranging from 0.25 μm to 15 μm diameter.The corresponding pitches range from 0.5 μm to 40 μm.Generally, a low-temperature anneal process of 150-400 • C can be achieved.The all-Cu interconnect across the bond interface provides good electrical performance and enhanced reliability.
Fig. 1 shows the DBI-integrated Wavelength Division Multiplexing (WDM) optical interconnect as part of the HPC module.The integration module can fully support industrial open standard chiplet architectures and communication protocols such as Universal Chiplet Interconnect Express (UCIe).The DBI-packaged WDM transceivers enable high channel density, long distance, and energy-efficient data communication between GPUs/ASICs on individual HPC modules in which the optical transceivers directly bridge with GPUs/ASICs via electrical transceiver on the EIC.In the proposed system, the EIC features full 8:1 SerDes transceivers with drivers optimized to drive microdisk modulators and transimpedance amplifier (TIA) front-ends optimized to interface with low-capacitance waveguide photodetector with wavelength selection via microdisk drop filter.The PIC features AIM Photonics' standard O-band microdisk modulator and custom-designed O-band microdisk drop filter.The microdisk resonator, with its compact footprint and high-Quality Factor, is a competitive candidate for WDM with multiple microdisk modulators integrated onto a single waveguide for multiple wavelength modulation selection and occupies a much smaller area compared to the conventional Mach Zehnder Modulator (MZM).

II. EIC-PIC INTEGRATED SYSTEM ARCHITECTURE
Fig. 2 illustrates the integrated transceiver (TRX) platform consisting of the CMOS EIC die flip-chip bonded to the SiPh PIC with DBI.The 1.5 mm ×1.5 mm TRX EIC is fabricated in GlobalFoundries 12 nm FinFET process and features 32 Tx and Rx pairs.These TRX pairs include 8:1 serializing transmitters with high-speed microdisk modulator drivers, 1:8 deserializing receivers with TIA front-ends and low-complexity timing recovery, and microdisk resonating wavelength thermal tuning control circuitry.The optical transceiver circuits were carefully co-designed to fit within 20 μm vertically to match the DBI bonding pad pitch for integration with the PIC.The layout dimension of the individual optical transmitter and receiver are 67 μm by 20 μm and 80 μm by 20 μm, respectively.The EIC also includes 50 Ω impedance-matched electrical transceivers to communicate with GPUs/ASICs.The 5.5 mm × 7.5 mm SiPh PIC was implemented on a custom 300 mm wafer-scale AIM Photonics silicon photonics run with variations of custom-designed O-band microdisk modulators with integrated heaters, add-drop filters, and silicon-germanium (SiGe) photodetectors.The active photonic devices take up only ∼ 2% of the PIC die area, with the rest utilized as a first-level optical interposer for the EIC, transitioning the signal and power nets to wirebond pads at the PIC periphery.The EIC die area is 5.45% of the PIC die and takes advantage of the more advanced 12 nm FinFET process node.The EIC and PIC are integrated using DBI at Nhanced Semiconductors.The signals are wirebonded to a secondary organic laminate interposer and assembled onto the test PCB on pogo pins.Polarization maintaining single-mode fibers are edge-coupled using v-grooves.The photo of PIC, EIC, the wafer-to-die 3D stack after DBI, and the final packaged transceiver module are shown in Fig. 3.
Based on established wirebond models [10], a single wirebond contributes a loss of 0.5 dB at 100 GHz, however coupling factors impact signal transmission bandwidth in dense multiwire bond packaging [11].Bandwidth limitation studies of wirebonds have previously been reported in the literature to behave as a low-pass filter with severe signal transmission limitations for frequencies beyond 35 GHz [12].In our application, the wirebonds are only part of the interconnect for slower ancillary I/O signals upto 4 Gb/s, hence are not a source of bottleneck or crosstalk.All high-speed signals faster than 4 Gb/s are routed through the DBI bonds featuring reduced interconnect parasitics, resulting in minimal crosstalk.For our signal frequency range, wirebonds exhibit a relatively flat reflection coefficient and less than 0.2 dB transmission loss.While denser wirebonding is possible, in our application, we are limited by pad dimensions and bonding capability.A three-level staggered wirebonding strategy is used with an adjacent pad-to-pad pitch of 70 μm, and lengths are kept below 1 mm to minimize coupling through parasitic impedances.
A model of DBI pad array consisting of all layer stack (FEOL+BEOL) from both 12 nm FinFET and PIC is simulated.Consisting of a 3 by 3 array with a target signal pad at the center and coupling parasitic capacitance from 8 neighboring pads, which are tied to the ground to simulate worst-case coupling capacitance.The simulation via Ansys Q3D shows 5.5 fF total capacitance for the target pad at the center of the array.In comparison, with 12 nm FinFET EIC's FEOL+BEOL layer stack alone, the micro-pillar packaging has around 7.16 fF, and the SnAg Controlled Collapse of Chip Connection (C4) bump has ≥50 fF, according to simulation.Recent literature on Cu-pillar bonding reported a capacitance of 80 fF [13] for the bonding interface.The Ansys Q3D simulation concludes that this approach features substantially reduced parasitics compared to alternate options in simulation and in reported literature, as mentioned above.The bonding pitch is 20 μm pad-to-pad for both 10 μm diameter bond pads on the EIC side and 6.6 μm diameter bond pads on the PIC side.4 shows the simulation of receiver OMA sensitivity with respect to total input capacitance.We estimate the DBI to result in a 75% reduction in total input capacitance (PD capacitance + package parasitics from 100 fF to 14 fF, and with 14 fF TIA input capacitance) relative to a C4 and micro-pillar bond.This translates into a 6.1 dB improvement in RX OMA sensitivity, demonstrating the advantage of direct bond over C4/micro-pillar bond in a 3D packaging scheme.Overall, the optical receiver's OMA sensitivity improves at a 5 dB/dec rate with the reduction of total input capacitance, including PD, bonding parasitics, and TIA input capacitance, or at 10 dB/dec rate if TIA amplifier size and its input capacitance are fixed due to power budget limitation.FIFO (First-In-First-Out) before being transmitted to the optical transmitters (OTX).All 8 optical TX in the group serializes the same 8-bit parallel data into full-rate data and modulates the designated wavelength via a microdisk modulator.The 8 optical receivers (ORX) in a data path group de-serialize the received optical data into 8 parallel bits of 1/8 data.The 8 electrical transmitters (ETX) can select the 8-bit output from one of the 8 optical RX and transmit the data to the external GPUs/ASICs.

III. 12 NM FINFET EIC TOP-LEVEL ARCHITECTURE
The clock distribution network of EIC launches from the PLL located at the center of the chip.The PLL generates a differential quarter-rate clock which is distributed to the optical TX and electrical RX via a network of cross-coupled CMOS inverter buffers.The electrical RX directly receives the clock from the distribution network, while the optical TX is clocked in an 8-channel group by an injection-locked oscillator (ILO), which receives its quarter-rate clock injection from the clock network.The optical transceiver implements a forward-clocking architecture.The 8th optical TX is designated as a forward clock transmission channel, and its 8-bit input is programmed as a periodic quarter-rate clock pattern.The 8th optical RX serves as the forward optical clock receiving channel and is the source of the clock network (highlighted in blue) that distributes the source synchronous clock into all of the 31 optical data receiving channels.
IV. ELECTRICAL TRANSCEIVER IMPLEMENTATION Fig. 5(b) shows the architecture of the electrical transceiver, which serves as the interface between the 32-WDM transceiver and the HPC module's GPUs/ASICs.The electrical RX is implemented in full-rate sampling architecture with clock and data recovery (CDR) that receives 1/8 rate data from GPUs/ASICs with automatic sampling clock alignment via phase interpolar (PI) based phase rotator.The electrical RX front-end utilized a continuous-time linear equalizer (CTLE) with 4.5 dB peaking to compensate for the channel loss between EIC and GPUs/ASICs.The electrical TX begins with an 8-to-1 Mux selecting the deserialized 1/8-rate data from one of the 8 optical RX.A single-to-differential conversion circuit precedes the differential output driver.The Source-Series Termination (SST)

V. OPTICAL TRANSMITTER IMPLEMENTATION
The optical transmitter architecture is shown in Fig. 6(a).Efficient global clock distribution is performed with a differential quarter-rate clock distributed to 4 injection-locked oscillators (ILOs) that each clock a group of 8 transmitters in a bundle.These ILOs generate 4-phase quarter-rate clocks that drive the final 4:1 serializer and, after passing through a divide-by-2 block, the initial 8:4 serializer.Per-channel buffers implement both clock phase quadrature error correction (QEC) and duty cycle correction (DCC).An on-chip PRBS-15 generator provides the 8-bit parallel input data to the initial 8:4 serializer.The data is then serialized to full rate in the final 4:1 mux with minimal transistor high-impedance slices driven by dynamic pulse gates.An inverter-based output driver with AC-coupling on-chip bias-tee is used.Powered by a 0.7 V supply, the driver outputs a 1.2 V ppd electrical swing to the microdisk modulator after capacitive voltage division of AC coupling and achieves a maximum of 7 dB extinction ratio (ER).A programmable differential/single-end output driver is implemented that can further reduce transmitter power consumption if a novel low electrical modulation swing and high extinction ratio optical modulator are available.A forwarded-clock transmitter, which is a replica of the data transmission channel with the 8-bit input programmed as a quarter-rate clock pattern, forwards a clock signal to the receiver channels for low-complexity timing recovery.Fig. 6(b) and (c) shows the microdisk modulator structure based on the information in [16] and the 12 Gb/s optical eye diagram achieving an extinction ratio of around 7 dB.

VI. OPTICAL RECEIVER IMPLEMENTATION
The optical receiver architecture is shown on the right side of Fig. 7(a).A multi-stage high-transimpedance TIA front-end with tunable bandwidth is implemented to provide improved and consistent OMA sensitivity under variations of input capacitance and feedback resistance due to fabrication and packaging Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.tolerances.Bandwidth tuning is achieved by a bank of programmable direct-feedback inverters that act as active inductance for bandwidth extension.An additional active inductor voltage amplifier stage then drives 4 parallel samplers that perform the initial 1:4 deserialization.An operational transconductance amplifier (OTA) based offset cancellation loop with RC-low pass common mode extraction is implemented to cancel the common mode offset at the samplers' input node caused by variation of input photocurrent common mode due to variation of input optical swing.The four samplers are clocked with 4 ILO-generated quarter-rate clock phases, where tunability of the ILO center frequency is implemented to allow for a programmable phase shift and low-complexity timing recovery.The final 8:4 deserializing stage then follows this.The forward-clock receiver channel has a similar front-end design but with no active inductance frequency peaking since the clock is single-tone.The forward clock is converted to differential signaling and distributed differentially by the clock network before being injected into the ILO in each data receiver channel.Fig. 7 also shows the layout and optical testing result of the custom-designed add-drop filters.Fig. 7(b) shows the layout of the custom add-drop filter with the curved coupler section to improve fabrication tolerance, trench to eliminate high-order whispering gallery mode, and P-doping diffusion resistor at the center of microdisk which serves as a thermal heater for resonance wavelength tuning.Fig. 7(c) shows the through port and drop port spectrum.The drop port shows negligible loss of less than 0.1 dB.The measured Free Spectral Range (FSR) is 2.9 THz, which can safely cover 32 wavelengths with 80 GHz channel spacing.Fig. 7(d) shows the thermal wavelength tuning capability of the custom add-drop filter.The measured thermal tuning efficiency is 0.27 nm/mW (49 GHz/mW).The heater resistor is designed to target 50 Ω to guarantee the current mirror-based DAC of the resonance tuning circuitry on the 12 nm FinFET EIC with the sufficient dynamic range due to its limited maximal 1.8 V supply thick-oxide output transistor.
Lorentzian fitting of the spectrum shows the custom microdisk add-drop for receivers has a Q value of 3361.Compared to the Q ≥ 5000 of the microdisk modulator, the reduced Q value is due to the increased round-trip loss caused by the drop-port coupler.We have improved the coupler design of the add-drop disk to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. Optical Transmitter Characterization
Fig. 8(a) shows the testbench of the optical TX.An O-band laser provides the continuous wave (CW) source as the optical input.A polarization controller is used at the input to compensate for any random polarization fluctuation.A Praseodymium Doped Fiber Amplifier (PDFA) is used to enhance the signalto-noise ratio (SNR) after the output edge coupling.Opticalto-electrical (O/E) conversion via Agilent 11982 A receiver converts the optical signal into an electrical signal to bridge the signal into the Digital Communication Analyzer's (DCA) electrical input.The clock synthesizer's trigger signal is used to synchronize the chip under test and the DCA.The clock synthesizer provides the half-rate clock for 6 Gb/s to 12.5 Gb/s and is converted into differential signaling via balun/hybrid.A pair of DC Blocks are used to AC-couple the differential clock onto the chip's clock input buffer with a self-biasing resistive network.
A thermal tuning circuit similar to those in the prior art is implemented on the 12 nm FinFET EIC for each channel [14], [20], [21].Due to the use of an AIM standard microdisk cell, the PIC's microdisk modulator does not have enough tuning efficiency to achieve a broad tuning range given the relatively large heater resistor of the microdisk that limits the current driving capability of the 12 nm FinFET tuning circuit's output driver, which operates under 1.8 V supply with limited voltage headroom.During the experiment, thermal tuning and wavelength locking were performed externally with DC supplies and monitored via an optical spectrum analyzer.
Fig. 8(b) shows the measured 12 Gb/s eye diagram from multiple TX channels with variation in eye-opening due to the fabrication-induced resonance wavelength tunability and extinction ratio variation.The channels were modulated independently to acquire the eye diagrams.Crosstalk measurements showed 36 dB isolation for adjacent channels in the spectrum for 80 GHz channel separation and target filter bandwidth.To test WDM transmission, two CW laser sources are coupled onto the fiber/waveguide via an optical combiner.Fig. 9(a) shows two modulators isolated and separated by 0.45 nm, corresponding to targeted 80 GHz channel spacing by design.Two CW lasers were aligned to the two targeted resonance wavelengths.The optical bandpass filter, which is used to emulate the optical receiver's spectrum, is tuned to the designed receiver drop filter bandwidth of 0.42 nm.This combination resulted in measured crosstalk isolation of 36 dB, shown in Fig. 9(b).Fig. 10 shows the eye diagrams measured for simultaneous 12 Gb/s transmission from the adjacent modulators outlined above.The comparative measurement between simultaneous transmission and one transmitter turned off shows minimal impact on the eye-opening and jitter.Since the microdisk's heater resistor is connected to the circuit with a 1.8 V transistor on the EIC, the externally applied tuning voltage cannot exceed 2 V in order to prevent transistor breakdown and form irreversible leakage within the EIC, thus limiting the resonance wavelength tuning range and the number of separable channel in this WDM testbench.
Fig. 11(a) shows the measured eye diagram of modulated optical output with a data rate of 18 Gb/s, using the on-chip PRBS-15 generator.As the data rate reaches 18 Gb/s, the bandwidth limitation/Intersymbol Interference (ISI) is more pronounced.It is suspected that a larger capacitive load is present at the TX output driver than the original target of 35 fF junction capacitance.A recent paper suggests that large input optical power across a microring modulator's PIN junction can induce photocurrent across modulating junction.The modulator effectively becomes a photodiode, and the photocurrent can be enhanced by the heating from resonance-tuning thermal resistor and reach as large as 100 μA [15].Such photocurrent induces IR drop on the on-chip bias-tee's large resistor and changes the biasing condition of the modulator.Therefore, a series of measurements is taken on both the stand-alone AIM Photonics' O-Band microdisk modulator and the DBI-packaged chip with 32 cascaded microdisk modulators on a single waveguide.These include probe measurements on the stand-alone microdisk to extract the 2-port I-V characteristics of all the 6 possible combinations out of 4 ports of microdisk modulator, namely the P+ anode, N+ cathode, Heater+, and Heater-, and measure the current across the modulating P+N+ junction of the microdisk when the laser passes through the microdisk modulator and when the TX driver is applying electrical modulation.The measurements, together with a CLEO 2013 paper on microdisk modulator device structure with P-doping diffusion resistor for thermal wavelength tuning [16], lead to the construction of an equivalent device model, as shown in Fig. 11(b).In the probe measurements, it was discovered that parasitic PIN diodes between the N+ cathode, the heater's two diffusion ports, and parasitic resistors between the P+ anode and heater's diffusion regions exist even though an intrinsic silicon layer (i-Si) is located between the P+N+ modulating junction and the P doping diffusion resistor.The physical mechanism of the bandwidth reduction remains to be determined.The possibilities are: (1) the leakage current through parasitic resistors induces IRdrop on bias-tee and affects the modulation bandwidth, (2) the two parasitic diodes introduce additional junction capacitance and increase the loading seen by the TX output driver, and (3) The photocurrent across parasitic PIN diode induced by large input optical power, similar to the microring behavior described by [15].Also, note that the parasitic diode between the N+ cathode and Heater-(wavelength tuning port) limits the tuning voltage headroom to below 0.7+(N+ Cathode) = 1.6 V since the forward bias turn-on of such parasitic diode must be prevented.
A routing path across the packaging ties Heater+ to VSS, and a current path connects the Heater-to P+ anode via PIP+ parasitic resistance.If the Heater-is applied with a tuning voltage, the current induces an IR drop on the bias-tee resistor and pulls up the voltage at the P+ anode, which reduces the reverse biasing of the N+P+ junction and increases junction capacitance due to the voltage dependence of junction capacitance.To further address the possible IR-drop induced bandwidth reduction, Fig. 11(a) shows the simulated bandwidth limited 18 Gb/s eye overlapped on the measured optical eye assuming 32.8 μA leakage current flowing through the bias-tee's 21.33 kΩ resistor connected P+ anode and pull up the voltage.The leakage current through the 21.33 kΩ bias-tee resistor leads to a 0.7 V offset of the reverse biased modulation swing.The targeted TX modulation swing is from 0 V to −1.4 V.A 0.7 V bias offset on reverse bias due to IR-drop offsets the electrical modulation to swing from 0.7 V to −0.7 V.According to the C-V simulation on the PN junction's voltage dependency, the modulating junction capacitance can have up to 208% increase from the TX driver's originally designed target capacitance.The simulated eye diagram matches the bandwidth limitation/ISI observed from the measured optical eye diagram, as shown in Fig. 11(a).

B. Optical Receiver Characterization
Fig. 12(a) shows the optical RX testbench.The clock synthesizer synchronizes both the PRBS-15 generator and the Bit-Error-Rate Tester (BERT) with the chip under test.An electrical DAC with programmable output serves as a PRBS generator that drives the discrete MZM, which modulates the CW wave of the O-band laser into an optical PRBS stream with around 7 dB extinction ratio that serves as the input to the chip under test.An Optical Delay Line (ODL) is inserted after the input path's PDFA to introduce a tunable optical signal time delay for timing margin/bathtub curve measurement.A Variable Optical Attenuator (VOA) is used to attenuate the input optical power for OMA sensitivity measurement.The deserialized data from optical RX is transmitted to BERT via SST electrical TX mentioned in Section IV.
Fig. 12(b) and (c) shows the time margin and sensitivity of the optical RX.Here, the RX is tested with the optical PRBS-7 pattern at 18 Gb/s and 25 Gb/s and the PRBS-15 pattern at 12 Gb/s due to the limitation of available equipment.With 12 Gb/s PRBS-15 input, the RX can achieve BER = 10 −12 (defined error-free margin) with a timing margin of 12% UI and achieves −20.3 dBm OMA sensitivity under the photodiode responsivity of 0.8 A/W and MZM modulation extinction ratio of −6.9 dB.At 18 Gb/s, the OMA sensitivity is −18.82 dBm with timing margin = 5.4% UI with optical PRBS-7 input.The optical RX power consumption at 18 Gb/s is 6.11 mW, contributed by TIA, 4 quarter-rate samplers, the 4-to-8 deserializer, and the per-channel ILO, and the power efficiency is 339 fJ/bit.The RX can further operate up to 25 Gb/s with −17.01 dBm OMA sensitivity, 12.5% UI timing margin, and power efficiency of 191 fJ/bit.Optical receiver's data rate is limited by the sampling clock's jitter.To optimize the circuit operation, the power supply of per-channel ILO is scaled up from 0.8 V to 0.9 V to suppress   I with passively cooled laser and wavelength shuffling tuning scheme of [17], and (c) based on tuning of 3σ wavelength variation.
the random jitter to reach 25 Gb/s and result in a better timing margin compared to that of 18 Gb/s.

C. Power Efficiency and Link Budget Analysis
Fig. 13(a) shows the power consumption breakdown of the individual channel/wavelength optical transceiver operating at 18 Gb/s.The transceiver power efficiency is 496 fJ/bit at 18 Gb/s.Table I shows the link budget [4] of the entire optical interconnect.The electrical power required to drive the laser on each wavelength can be calculated as 10.3 mW given the measured −18.82 dBm OMA sensitivity at 18 Gb/s, the nominal optical loss of each component encounters along the optical path, assuming the system power margin of 3 dB and 30% of wall-plug Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I LINK BUDGET ANALYSIS AT 18 GB/S
efficiency.In the WDM transceiver energy efficiency analysis, we utilized the data obtained from the experiments described and assumed the WDM optical source to utilize photonic-crystal-Kerr-resonator WDM comb source [25][26] that achieved 65% optical power conversion efficiency for the WDM laser source.Assuming the highest efficiency pump laser (DFB) at O-band, 45% [28], then the WDM wall plug efficiency becomes 65*45% = ∼30%.Due to the self-injection locking nature of the WDM comb, this WDM source does not require a thermo-electric (TE) cooler (Case II in Table I).On the other hand, other methods that may use a DFB laser array source [27], achieving 36% peak efficiency, could also utilize a package without a TE cooler with active laser wavelength stabilization by electrooptical wavelength control schemes.Wan et al. [29] utilize wavelength monitor and feedback control to tune their WDM modelocked laser again, not utilizing TE coolers.Lastly, the commercial method utilized by Ayar Labs allows modulator wavelengths and add/drop filter receiver wavelengths to tune and lock to WDM laser wavelengths such that, in principle, TE cooler wavelength stabilization is not necessary so long as the laser is mounted on a heatsink that conductively dissipates heat [30].Table I includes WDM optical source wall power for both with and without a TE cooler (Case I and Case II, respectively).
Regarding the power required to tune the resonance wavelength of the microdisk modulator and filter, since the thermal tuning is red-shift only, two possible resonance tuning schemes are discussed: The first scheme is based on the "wavelength shuffling" scheme proposed in [17], which tunes each microdisk resonance to the nearest laser wavelength instead of tuning the microdisk to a pre-designated laser wavelength.Thus, the maximal red shift when tuning to the nearest laser is one wavelength division spacing = 0.519 nm under 32-WDM and FSR = 16.6 nm.With a tuning efficiency of 0.27 nm/mW, the maximal tuning power for each microdisk is 1.921 mW.
The second scheme is to locate each microdisk's nominal resonance wavelength at -3σ of its pre-designated laser wavelength (λ microdisk = λ laser − 3σ).The red-shift-only thermal tuning circuit can cover all the 6σ of the Gaussian distribution of microdisk resonance process variation with average tuning power being 3σ times the thermal tuning efficiency.The microdisk resonance wavelength is primarily affected by global non-uniformity of device layer thickness across the whole SOI wafer, and the 32 microdisks of the PIC are clustered in an approximately 1.2 mm by 0.5 mm area.For the PIC in this work, the standard deviation of layer height σ height can be assumed to be no larger than 0.1 nm.The Monte Carlo simulation based on this constraint shows the custom-designed microdisk resonance wavelength process variation's standard deviation to be σ=0.5 nm, and the average tuning power is calculated to be 5.556 mW.Fig. 13(b) and (c) shows the overall link's power breakdown, including the laser driving electric power calculated from linked budget analysis and the microdisk tuning power at both TX/RX based on two schemes discussed above.At 18 Gb/s, the overall link efficiency is 1.282 pJ/bit for the wavelength shuffling scheme and 1.686 pJ/bit for the 3σ tuning scheme.Table II shows the summaries and comparison of the optical transmitter and receiver with prior arts.

VIII. CONCLUSION
We have demonstrated a co-designed electronic-photonic integrated transceiver with advanced 3D DBI integration and copackaged optics.The high-density 3D direct bonding scheme of DBI allows 32 electrical and optical transceivers to be integrated onto a compact footprint of 1.5 mm by 1.5 mm.The electrical and optical transceiver architectures, which serve as interconnects that bridge multiple HPC modules, are presented.The microdisk device parasitics-induced optical transmitter Intersymbol Interference is analyzed, and the simulation based on the analysis is matched to the measured 18 Gb/s eye diagram.The reduced parasitics of the DBI packaging enables OMA sensitivity of -20.3 dBm at 12 Gb/s and optical transceiver pair energy efficiency of 496 fJ/bit at 18 Gb/s.The optical receiver can operate at 25 Gb/s with a sensitivity trade-off at an OMA sensitivity of -17.01 dBm.The link budget analysis indicates 10.3 mW laser driving power and 1.282 pJ/bit optical interconnect power efficiency when including the power consumption of the laser driver and wavelength-shuffling resonator tuning scheme.This work presents the first demonstration of a 3D DBI-integrated EIC-PIC transceiver.The close integration of co-designed 12 nm CMOS EIC and silicon photonic integrated circuits enabled by 3D DBI achieves a 4x reduction in the parasitic capacitance leading to a 6.1 dB improvement in receiver sensitivity, reducing the laser power requirement by the same factor.Additional benefits of 3D DBI, including lower EMI noise as well as the silicon photonic resonant ring electro-optical modulator, contributed to achieving the record high data link power efficiency.The energy-efficient compact 3D integrated platform compatible with state-of-theart silicon CMOS ecosystem indicates viable applications in future high-performance computing and data center computing systems.
Mehmet Berkay On received the B.S. degree in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 2018.He is currently working toward the Ph.D degree in electrical and computer engineering with the University of California, Davis, CA, USA.His research interests include energy-efficient photonic neuromorphic computing, RF-photonic signal processing, fiber-optic communication/networking, and quantum networks.David Scott CEO, graduated from UCLA in 1997, where he invented the Traveling Wave Heterojunction Phototransistor.He authored and coauthored a chapter on the physical properties of the phototransistor in the popular book entitledInP HBTs: Growth, Processing, and Applications published in 1995, edited by B. Jalali and S. J. Pearton.From 1997 to 2000, he continued developing high-speed InP-based photoreceivers and optical modulators with TRW for ultra wideband optical communication systems for space-based applications.A 40 Gbps photoreceiver that Dr. Scott designed and manufactured was flown on a space shuttle mission in 2001.In addition to his technical accomplishments with TRW, he was also the Plant Materials and Processes Manager for the Photonics Technology Department.In 2001, he was one of the Co-Founders of VSK Photonics along with Drs.Timothy Vang and Srinath Kalluri.VSK Photonics was funded by $26.5 M of equity financing along with $10 M in debt financing.From 2001 to 2004, he was the VP of Engineering with VSK Photonics and managed the MMIC design, packaging, testing, and device modeling groups.In 2004, VSK Photonics was merged with Archcom Technology and Dr. Scott drove the operations and technology transfer.In 2007, He was promoted to CTO with Archcom and was In-charge of all receiver and transmitter module operations, production, new product development, customer support, and defense contracts.In 2012, he was a key participant in the acquisition of Archcom by Hisense Broadband Multimedia Technologies (HBMT).The combined companies have more than 2500 employees with greater than $400 M in revenues.With HBMT, he was VP of Engineering for their US R&D Center.In 2013, he founded Optelligent and is the CEO.Optelligent provides design and assembly engineering services with quick turnaround times.Optelligent's capabilities include precision optical die-to-die placements within 1um, precision wire bonding to control port impedances, and sub-micron active optical component alignments.
Robert Patti recieved the BSEE/CS and BSPH degrees from Rose-Hulman Institute of Technology, Terre Haute, IN, USA.He is the owner and President of NHanced Semiconductors.He founded ASIC Designs Inc, an R&D company specializing in high-performance systems and ASICs.During his 12 years with ASIC Designs, he participated in the design of over 100 chips.Tezzaron Semiconductor grew from that company, with Bob as its CTO, and became a leading force in 3D-IC technology.Tezzaron built its first working 3D-ICs in 2004.NHanced Semiconductors was spun out of Tezzaron to further advance and develop 2.5D/3D technologies, chiplets, die and wafer stacking, and other advanced packaging.He was the recpient the SEMI Award for North America in 2009, served as Vice-Chairman of JEDEC's DDRIII / Future Memories Task Group, and holds 18 US patents, numerous foreign patents, and many more pending patent applications in deep sub-micron semiconductor chip technologies.
Yang-Hang Fan (Student Member, IEEE) received the B.S. degree in engineering and system science and M.S. degree from the institute of electronics engineering, National Tsing Hua University, Hsinchu, Taiwan, in 2007 and 2009, respectively.He is currently working toward the Ph.D. degree in electrical engineering with Texas A&M University, College Station, TX, USA.From 2011 to 2015, he was with Faraday Technology, Hsinchu, where he worked on the design of mixed-signal integrated circuits for high-speed serial data communication.In 2018, he was a Research Associate Intern with Hewlett Packard Labs, Palo Alto, CA, USA, where he worked on the high-speed optical input-output (I/O) architecture.His research interests include mixed-signal integrated circuits and low-power high-speed circuits for electrical and optical communication.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 2 .
Fig. 2. System architecture, transceiver layout, top and cross-section view of DBI-packaged optical transceiver module.

Fig.
Fig.4shows the simulation of receiver OMA sensitivity with respect to total input capacitance.We estimate the DBI to result in a 75% reduction in total input capacitance (PD capacitance

Fig. 5 (
Fig. 5(a) shows the die photo of 12 nm FinFET EIC with blocks of optical and electrical transceiver channels labeled over the die and the top-level architecture diagram of EIC on the right.The EIC consists of 32 optical and electrical transceiver pair, which are organized into 4 groups of electrical-optical data paths.Each group contains 8 electrical and optical transceiver pair.In each data path group, the 8 electrical receivers (ERX) acquire 1/8 rate data from external GPUs/ASICs.The sampled 8-bit parallel data are synchronized to the same clock edge via

Fig. 5 .
Fig. 5. (a) Die photo of 12 nm FinFET EIC with floorplan labeled and top-level architecture indicating clock distribution network and data path.(b) Block diagram of the electrical transceiver.

Fig. 6 .
Fig. 6.(a) Block diagram of optical transmitter bank with 8 data channel and 1 replica clock channel, (b) the microdisk modulator device structure based on [16], and (c) 12 Gb/s optical eye diagram.

Fig. 7 .
Fig. 7. (a) Block diagram of optical receiver bank with 8 data channels and 1 forward-clock receiving channel, and custom-designed microdisk filter's (b) layout, (c) through and drop port spectra showing the FSR, and (d) resonance wavelength tuning within a single FSR window.

Fig. 8 .
Fig. 8. (a) Optical transmitter test bench, and (b) 12 Gb/s optical eye diagram from multiple TX channels.All scope plots are with identical y-scale of 100 mV/div and an x-scale of 16.5 ps/div.The device designs are identical, but fabrication variations result in different extinction ratios and heater tunability, resulting in varying performance and corresponding eye openings.

Fig. 9 .
Fig. 9. (a) Two transmitters aligned to a spacing of 0.45 nm (80 GHz) to replicate adjacent channels at the target channel spacing, (b) shows 36 dB crosstalk isolation with the CW lasers aligned to the channel resonances and the optical filter tuned to the designed filter bandwidth.

Fig. 10 .
Fig. 10.Crosstalk impact from adjacent modulator transmission.(a) Measured 12 Gb/s eye diagram from the simultaneously transmitted modulators and (b) with one modulator turned OFF.The comparison shows minimal impact on the eye-opening and the measured jitter.

Fig. 11 .
Fig. 11.(a) Output optical eye diagram at 18 Gb/s with overlapped simulated bandwidth limited eye diagram.The y-scale is 67.7 mV/div, and the x-scale is 11.1 ps/div.(b) Microdisk modulator device structure based on[16] and its equivalent device model with on-EIC bias-tee and current leakage path indicated via a green dotted line

Fig. 12 .
Fig. 12.(a) Optical receiver test bench.(b) Measured timing margin and (c) OMA sensitivity of the optical receiver.

Fig. 13 .
Fig. 13.(a) Power consumption breakdown of individual channel/wavelength optical transceiver at 18 Gb/s.(b) Link power efficiency at 18 Gb/s based on link budget analysis of TableIwith passively cooled laser and wavelength shuffling tuning scheme of[17], and (c) based on tuning of 3σ wavelength variation.

Ankur
Kumar (Graduate Student Member, IEEE) received the B.E.(Hons.)degree in electrical and electronics engineering, M.Sc.(Hons.)degree in mathematics from the Birla Institute of Technology and Science, Pilani, India in 2014 and the M.S. degree in electrical engineering from Texas A&M University, College Station, TX, USA in 2018.He is currently working toward the Ph.D. degree in electrical engineering with Texas A&M University.In 2014, he was a Design Intern with STMicroelectronics Pvt. Ltd., Greater Noida, India.From 2014 to 2016, he was a Senior Systems Engineer with Hewlett Packard Enterprise, Bangalore, India.In 2018, he was a Design Intern with Texas Instruments Incorporated, Duluth, GA, USA.In 2020, he was a CMOS Design Research Intern with Hewlett Packard Labs, Milpitas, CA, USA.His research interests include design of high-speed and low-power circuits for optical and electrical communication and clock and data recovery circuits.Hyungryul Kang received the B.S. degree in electrical and electronics engineering from Chung-Ang University, Seoul, South Korea, in 2008 and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2010.He is currently working toward the Ph.D. degree in electrical engineering with Texas A&M University, College Station, TX, USA.From 2010 to 2018, he was with Samsung Display, Giheung, South Korea, where he was the Staff Engineer with the Display Electronic Development Department.His research interests include mixed-signal integrated circuits and high speed circuits for electrical and optical communication and clock and data recovery circuits.Il-Min Yi received the B.S., M.S., and Ph.D. degrees in electronic and electrical engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2007, 2010, and 2015, respectively.From 2015 to 2016, he was a Postdoctoral Researcher with POSTECH.From 2017 to 2020, he was with Device Technology Labs, Nippon Telegraph and Telephone Corporation, Tokyo, Japan, where he focused on the design of high-speed analog-to-digital converter circuits.Since 2020, he has been involved in the design of high-speed electrical and optical link circuits with Texas A&M University, College Station, TX, USA.His research interests include high-speed serial/parallel links, high-speed ADC circuits, and signal integrity.Dedeepya Annabattuni received the B.E.(Hons) degree in electronics and communications engineering from the Birla Institute of Technology and Science, Hyderabad, India in 2014, and the M.S. degree in electrical engineering from Texas A&M University, College Station, TX, USA in 2021.From 2014 to 2019, she was an Analog and Mixed Signal Layout Engineer with in the High-Speed SERDES IP group, Synopsys, Hyderabad, India.In summer 2020, she was a SERDES Circuit Design Intern with Qualcomm Inc, San Diego, CA, USA.In June 2021, she joined Cadence Design Systems, Cary, NC, USA, as a Circuit Design Engineer and is currently working on receiver circuits in the SERDES IP group.

Yuanming
Zhu received the B.S. degree in electronic science and technology from Wuhan University of Technology, Wuhan, China, M.S. degree in integrated circuit engineering from the University of Chinese Academy of Science, Beijing, China, and Ph.D. degree in electrical engineering from Texas A&M University, College Station, TX, USA.He was a Research Associate Intern with Hewlett Packard Lab, Palo Alto, CA, USA, in 2019 and an analog mixed-signal design Intern with Marvell, Santa Clara, CA, in 2020.Since 2022, he has been with Intel Lab, Hillsboro, OR, USA, as a Research Scientist.His research interests include high-speed analog-to-digital converters, high-speed electrical links, and silicon photonics.

TABLE II OPTICAL
TRANSMITTER AND RECEIVER SUMMARY AND COMPARISON FOR NRZ/PAM4 SIGNALLING SCHEMES