Wafer-level testing of inverse-designed and adjoint-inspired dual layer Si-SiN vertical grating couplers

Recently, silicon photonics foundries started providing access to new dielectric stacks which can be utilized to reduce optical I/O losses. For example, in a hybrid c-Si/SiN platform, inverse design techniques can be used to create novel dual layer grating coupler (GC) designs which, in simulations, reach state-of-the-art performance. In this paper, we experimentally validate such designs for perfectly vertical single-polarization GCs in the O-band consisting of a single-etch c-Si layer with a patterned SiN overlay, fabricated using a 193nm DUV immersion lithography process on 300mm wafers. Here, we investigate designs generated by two different design paradigms: inverse design based on the adjoint method and adjoint-inspired design. Using wafer-level testing, we experimentally demonstrate a record low median insertion loss (IL) of 1.3dB (with interquartile range of ∼0.1 – 0.2dB ) for perfectly vertical coupling in DUV lithography compatible devices which is a ∼0.5dB improvement over previously demonstrated single-layer, single-etch c-Si 0deg GCs.


Introduction
In various applications, optical I/O in silicon photonics chips typically has a significant impact on the overall loss budget. For instance, silicon photonic interconnects for future exascale high performance computing (HPC) networks incur losses at several fiber-chip transitions, especially if off-chip lasers are used, and a reduction in grating coupler (GC) loss would consequently result in an overall improvement of the link's energy-efficiency [1][2][3]. Unsurprisingly, GCs have been a long term research interest of the silicon photonic research community with many advances towards low-loss, wide bandwidth and low-reflection device geometries that are compatible with volume fabrication [4][5][6]. In recent years, novel design methodologies based on inverse design and machine learning have been proposed to reduce coupling losses in GCs while respecting fabrication constraints present in, e.g. available deep ultra violet (DUV) lithography process flows [7][8][9][10][11][12][13][14]. Interestingly, these more advanced design techniques facilitate the usage of more degrees of freedom in the design, allowing designers to go beyond traditional single-layer, single-etch designs and leverage dual etch [10,[14][15][16][17][18][19][20][21], dual layer fabrication flows [7,11,[22][23][24][25]. Additionally, reduction in the minimal feature size would allow the usage of sub-wavelength (SWL) features [19,20,[26][27][28], although not all recent SWL designs are compatible with the constraints of scalable DUV immersion lithography [26].
In this paper, we apply advanced design techniques to GC design in an emerging hybrid c-Si/SiN layer stack [29][30][31][32][33][34][35]. Specifically, we investigate perfectly vertical GCs as the 0 deg coupling might simplify the mechanical design of single-mode connectors in datacom or HPC interconnects [3,11]. Here, we will leverage our previous theory paper [11] which showed how a SiN-layer can improve coupling efficiency (and reduce reflection) over a pure c-Si vertical GC design. We will then validate the previously proposed designs, which were obtained using inverse design, and compare them with alternative adjoint-inspired designs, which we proposed in our prior work on single-layer, single-etch c-Si GCs discussed in [36], but adapt in this paper to dual layer designs. For this purpose, we have created several design of experiments (DOEs) for dual layer (c-Si+SiN) single polarization GC designs compatible with the constraints imposed in a 193 nm DUV immersion lithography process developed for 300 mm wafers (section 2). Subsequently, we have performed 5-die measurements and data analysis on six different wafers (section 3). Furthermore, we have performed full wafer-level testing (WLT) measurements and data analysis on the two best performing wafers (section 4). On these two wafers (66 dies/wafer), using the calibrated measurement procedure developed in [36], we experimentally obtain a record low median 1.3 dB insertion loss (IL) for single polarization perfectly vertical dual layer GCs in the O-band. While the proposed vertical GC is primarily designed targeting single-mode connectors in datacom or HPC interconnects, our results might inspire advances in related optical I/O approaches in applications as diverse as coupling to multi-mode fibers [16,18], using gratings [14] or photonic crystal waveguides [37] as micro-antennas for photonic beam-steering (useful for lidar or remote sensing [21]), optical MRAM access [38], grating coupler lasers [39] and beyond.

Dual layer Si+SiN vertical grating couplers (GCs)
In our previous work [11], simulations of inverse designed vertical GCs using the adjoint method show that an additional patterned SiN layer can lead to a ∼0.5 dB improvement in IL compared to single layer vertical GCs (resulting in only 0.52 dB peak IL for certain dielectric layer stacks). Importantly, the proposed devices were designed using feature sizes compatible with scalable 193 nm DUV immersion lithography. Subsequently, in [36], different DOEs of single-layer vertical GCs were fabricated on a c-Si 300 mm wafer and analyzed using wafer-level testing. There, the best-performing single-layer c-Si design has a median 1.82 dB experimental insertion loss (IL), compared to a simulated ∼1.3 dB IL. In this paper, to minimize I/O losses, we aim to determine experimentally whether perfectly vertical GCs with patterned SiN features are (1) sufficiently fabrication robust to be compatible with volume production and (2) can reliably outperform their current single-layer alternatives in terms of IL. Before diving into the experimental results in the next sections, more details on the device concept, test structure design, measurement procedure, fabrication and simulation results are discussed in the following subsections. Figures 1(a)-(c) provide more details on the type of devices and test structures analyzed in this paper. Figure 1(a) schematically shows the desired operation of the dual-layer vertical GC, where light from an in-plane waveguide is coupled vertically into a single-mode optical fiber. For optimal scalability and repeatability, these GCs are embedded in loopback test structures that are compatible with wafer-level testing using a fiber array with 250 µm spacing between the fibers ( figure 1(b)). Similar to [36], to guarantee a reasonable footprint of the resulting test structures [4] and allow apples-to-apples comparison with our prior work, focused gratings are utilized (figure 1(c)). More details on the layer stack and GC design will provided in the following subsections.

Measurement analysis procedure
In this paper, we will analyze data of six dual layer 300 mm wafers. These wafers include an (etched) SiN overlay on top of c-Si and each have their own dielectric stack (i.e. thickness of SiN, thickness of oxide layer between c-Si and SiN, c-Si etch depth, etc). As vertical GCs are prone to reflection, leading to Fabry-Perot fringes which complicates extraction of the device characteristics, we apply the measurement analysis procedure developed in [36] for single-layer vertical single polarization grating couplers (GCs). This procedure allows for reproducible extraction of IL, 1 dB-bandwidth, and device reflectivity estimates. Figures 1(d)-(f) illustrates the measurement post-processing for an example dual layer vertical GC. First, the raw transmission and reflection data is obtained and subsequently smoothed using Savitzky-Golay (SG) filters ( figure 1(d)). Second, a reflection ripple, with a free-spectral range corresponding to the roundtrip length of the Fabry-Perot cavity formed by the reflection of the two GCs in the loopback test structure, is obtained by subtracting the smoothed spectrum from the raw data (figure 1(f)). A Hilbert transform and subsequent smoothening using an SG filter can be utilized to detect the envelope of the reflection ripple. This envelope can then be used to determine the envelope of the original raw data (figure 1(e)).

Fabrication of the dual layer Si+SiN wafers: layer stack, Si/SiN etch misalignment, and wafer splits
In figures 1(g)-(h) we show scanning electron microscope (SEM) pictures of dual vertical GCs at different stages of the fabrication process. The fabrication process for the SiN layer was previously developed for 200 mm wafers [30] and subsequently adapted to the 300 mm wafer fabrication flow [40]. As suggested by [11], we explore two layer stacks: one with nominal SiN thickness 600 and 200 nm pad oxide (i.e. oxide between the top of the unetched c-Si and bottom of the SiN layer), which in simulation obtains good performance, and one with thinner nominal SiN thickness 200 and 300 nm pad oxide, which might be more compatible with fabrication constraints imposed by packaging and fabrication requirements for other devices. The silicon-on-insulator (SOI) process is started with a 310 nm thick Si (to which an oxidation step is applied, reducing it to 300 nm) and a box oxide (BOX) of 720 nm. Similar to [36], we assumed 304 nm Si in our design optimizations and subsequent simulations, but we verified that the influence of this 4 nm difference is negligible. Both Si and SiN layers are patterned with an advanced DUV immersion lithography with a minimum feature size of 60 nm (100 nm) for the Si layer (SiN respectively). The patterning of the SiN, with these critical dimensions and aspect ratios developed by the foundry [29][30][31], can be considered state-of-the-art when compared with similar platforms currently developed by other foundries [32][33][34][35].
The initial layout is post-processed using a custom optical proximity correction (OPC) algorithm to ensure a maximum fidelity between the design and the fabricated structures. OPC has been applied to both the c-Si and the SiN layer. Note that there is a CMP-planarization step after the Si encapsulation, which should result in a flat oxide surface before SiN deposition. However, visual inspection of SEM pictures shows that there is in some designs still a teeth tilt due to imperfection in the surface flatness.
Impressively, due to the usage of immersion DUV lithography, post-fabrication analysis 5 shows that the alignment of the SiN and c-Si layer has a mean shift of <1 nm in both X and Y dimension, whereas the 3σ is limited to ∼8-9 nm and 4-6 nm for X and Y, respectively (orientations defined in figure 1(c)). As the IL of the designs proposed in [11] has an expected alignment tolerance in the Y direction where even a shift of a few 10 s of nm would not change the IL beyond ∼0.1 dB, the effect of the rather low experimentally obtained misalignment on device performance is expected to be negligible.
As for the difference between nominal and actual SiN thickness, using ellipsometry we noticed for the wafers with 600 nm SiN target thickness, the actual thickness is 591 nm, with a standard deviation of 2-3 nm. The wafers with 200 nm SiN target thickness have an actual 199 nm thickness. For the pad oxide, the foundry did not have a direct measurement available, but this oxide thickness is estimated to be 13-15 nm (with σ ≈ 6 nm) thinner than nominally intended. Note that prior simulations in [11] indicated that our designs are robust against layer thickness changes on this order of magnitude.
In addition to the split on the SiN thickness, to explore the effect of the c-Si etch depth, we had an etch depth split with nominal values 139 and 159 nm (154 nm actual). The SiN-layer was patterned using a full (nominal) 600 or 200 nm deep etch, respectively. Finally, we analyze six different wafers in this papers, of which four wafers (named W18, W19, W23 and W24) were part of the original splits dedicated to this project and have the same (optimized) lithography dose ('Dose 2'). Later on, two additional wafers (named W07, W11) were added to the splits, which were both processed with a 1 mJ stronger (sub-optimal) lithography dose ('Dose 1'). As a result, even though the same masks were used for the lithography steps in the fabrication of our devices, their geometry will be impacted. For instance, a nominal waveguide width of 400 nm is 10 nm narrower with Dose 1 compared to Dose 2. Inversely, holes are 5-10 nm wider with Dose 1. While a detailed study of the impact of these different doses on our design geometries is non-trivial due to the large variety of dimensions inside our designs, we have spot-checked for a few example devices that the dose choice causes deviations in our DOEs within a similar range.

Design methodology for adjoint and adjoint-inspired dual layer Si/SiN perfectly vertical GCs
In this paper, we investigate both inverse-designed GCs (using the adjoint method) as well as adjoint-inspired devices. For the inverse-designed devices, we have applied the design approach theoretically proposed in [11] to the different layer stacks mentioned in section 2.3. In these devices, the SiN layer is expected to primarily act as an anti-reflection layer, enhancing the directionality and bandwidth of the GC, while also cancelling back-reflections to the input. The Si-layer couples light (slightly) off-vertical in two different sections, which in combination with the SiN design send the combined mode perfectly vertical. Example geometries, and a more detailed analysis of the physics mechanism present in these devices can be found in [11].
On the other hand, similar to prior work for single-layer vertical GCs [36] and wide-bandwidth GCs [12], the adjoint-inspired devices proposed and analyzed in this paper are parameterized piecewise linear approximations of the geometries obtained in some of the best performing inverse designed dual layer GC devices. Specifically, we re-use our previously proposed parameterization scheme for the c-Si layer (more details in [36]), while adopting a naive three piece geometry for the SiN slab which consists of a symmetric configuration of two apodized SiN gratings around a center SiN slab (figure 2). The rationale behind this simplistic symmetric approximation of the SiN geometries returned by the inverse design method is that we wanted to have a geometry with few degrees of freedom, and we noticed that the inverse designed GCs typically had a 'core slab' (providing AR-coating functionality) preceded by a few SiN bars at the beginning of the GC, as well as an additional series of patterned SiN features at the other side of the GC (ideally helping to focus the upwards light in a perfectly vertical direction). Note that in practice, only the SiN features of the first linear apodization region after the first Si etch will have a significant impact on performance, as most light is still confined to the Si slab before that point. Likewise, the final features towards the end of the second linear apodization region should not significantly contribute to the overall IL, as most light has by then already scattered out of the waveguide, and these features only impact the final tail of the mode. Importantly, whereas the result of an inverse design procedure is typically a single fixed, unparameterized design, the easily parameterizable geometries utilized in the adjoint-inspired design allow to quickly generate a broad range of designs by taking the factorial of the free parameters in these designs. This allows designers to explore the parameter space close to well performing devices, where the intention is to generate devices that would also perform well if there are deviations from the nominal fabrication process.

Designs of experiments (DOEs) description
All our wafers contained the same set of design of experiments (DOEs), fabricated using the same lithography masks. Specifically, the main DOEs discussed in this paper are: (a) Adjoint Si (56 inverse designed devices without SiN overlay), (b) Adjoint Si+SiN 200 nm (56 inverse designed devices with SiN overlay, assumed to be 200 nm thick), (c) Adjoint Si+SiN 600 nm thick SiN (56 inverse designed devices with SiN overlay, assumed to be 600 nm thick), (d) Adjoint-inspired Si (576 parameterized adjoint-inspired devices without SiN overlay), (e) Adjoint-inspired Si+SiN 200 nm (729 parameterized adjoint-inspired devices with SiN overlay, assumed to be 200 nm thick), (f) Adjoint-inspired Si+SiN 600 nm (243 parameterized adjoint-inspired devices with SiN overlay, assumed to be 600 nm thick).
Here, the DOEs with pure c-Si designs are used as a baseline for the Si/SiN designs. The Adj. Si DOE is identical to the corresponding DOE discussed in [36], whereas the Adj.-Insp. Si DOE corresponds to Adj.-Insp. DOE 1 analyzed in [36].
As we will discuss in section 2. In hindsight, a more optimal usage of our mask space would have been to merge both DOEs into a bigger one to allow us to explore more geometries. Consequently, in simulation (measurement) Adj.-Insp. 600 nm DOE will not (not significantly) be able to outperform the Adj.-Insp. 200 nm DOE. Finally, for the single-etch c-Si baseline DOEs Adj. Si and Adj.-Insp. Si, we have followed the DOE designs proposed in [36].

Inverse-designed dual layer vertical GCs using adjoint method
For the DOEs of the inverse designed Si+SiN vertical GCs, we adapted the same mitigation technique against etch depth sensitivity as the one proposed in [36]: for both nominal SiN layer-stacks we had access to, we ran an inverse design procedure for a range of different nominal etch depths. As a baseline for our experimental results, using Lumerical's 2D FDTD solver, we subsequently simulated the performance of these devices on some key metrics for some nominal etch depth and SiN thickness combinations. We visualize some example data in figure 3. In line with prior predictions in [11], the device with lowest IL was obtained for the dielectric stack of the deeper 159 nm c-Si etch combined with a 600 nm SiN thickness. Moreover, we observe that best performing devices for a certain assumed wafer thickness were indeed designed for that respective SiN thickness (cfr. the difference in performance for different t SiN choices in the three example scenarios). In addition, several devices have a state-of-the-art ∼20 nm 1 dB-bandwidth or more. We notice that some of the devices with lowest IL obtain moderate <−10 dB worst in-band reflection. Comparison of the best IL within the Adj. DOEs for different nominal dielectric stacks, and additional baseline simulations illustrate the theoretically expected loss reduction in inverse-designed patterned dual-layer vertical Si+SiN GCs compared to single-layer designs (table 1, 'Nominal' and 'Baseline' simulations for the Adjoint devices). First, comparing different DOE types for a given nominal wafer type, DOE types which include a SiN layer  Table 1. Comparison of nominal, baseline and post-fabrication simulation results with experimental data obtained by full wafer-level testing (WLT) for W07 and W24. Simulations are performed using Lumerical's 2D FDTD solver. Lowest IL is highlighted per column in bold, whereas the best overall IL within the 'nominal' ,'baseline' and 'post-fabrication' simulations and the experiments is highlighted in green. Here, only Si+SiN DOEs consist of devices with a SiN overlay, where these SiN features are only expressed in the simulations if the SiN thickness is non-zero. For the experimental data, we reported the median IL for the best devices per DOE type and the interquartile range (IQR) for that best device. As the Adj.-Insp. Si DOE did not perform well in the 5-die measurements, we did not include any of its devices in the device list used for WLT.
(i.e. Adj. Si+SiN which assumed either SiN thickness 200 or 600 nm) outperform a comparable DOE with pure c-Si devices on the same wafer (i.e. Adj. Si). Second, comparing different wafer types (i.e. dielectric stacks) for a given Si+SiN DOE type, the best IL results are obtained in simulations which include the appropriate SiN layer in the dielectric stack (the simulation without the SiN layer contains the same c-Si features as in other dielectric stack simulations, but c-Si is only encapsulated by oxide). Third, simulations in which the patterned SiN overlay is replaced by a single wide slab of SiN covering all etched grating lines in the c-Si layer (i.e. figure 2(c) without the linear apodization sections, and the core slab starting before the first grating line, and ending after the last grating line) show that a SiN slab by itself impacts coupling, but finer patterning in the SiN layer is required for optimal performance.

Adjoint-inspired dual layer vertical GCs
As an example, we study the simulated performance of the adjoint-inspired DOEs and compare it with the other DOE types for the dielectric stack with 600 nm thick SiN, and a 139 nm c-Si etch ( figure 4). Here, it is apparent that the adjoint-inspired devices do not reach the same low IL values as the best devices in the inverse-designed Adj. Si+SiN 600 nm DOE. Indeed, the 2D FDTD simulations reveal that our simple adjoint-inspired parameterization scheme does not incorporate all the physics required to obtain optimal coupling. In other words, the low-dimensional, piecewise linear representation of the adjoint-inspired geometry described in section 2.4 was evidently suboptimal compared to the unrestricted geometry of the adjoint-optimized devices. Similarly, a difference in IL had been obtained for the single-layer adjoint-inspired devices studied in [36] versus their inverse designed counterparts (of which we use the parameterization scheme for the c-Si layer). However, for the dual layer Adj.-Insp. GCs in this paper, the best IL levels are comparable with the best IL values in the Adj. Si+SiN 200 nm DOE (which obviously was not designed for the 600 nm SiN stack we are assuming in the simulations shown in figure 4). Moreover, in terms of IL, both Adj.-Insp. Si+SiN DOEs outperform a comparable Adj-Insp Si DOE which is optimized for a single c-Si layer without SiN features (this DOE is based on the one suggested in [36]). As explained in [12,36], a well-designed parameterization of an adjoint-inspired device DOE has devices that cover a broad range in different device figure of merits (e.g. IL, center wavelength, reflection, etc), allowing designers to select optimal devices based on the system-level trade-offs, constraints and objectives that are relevant for their application. As the reflection, bandwidth, IL and center wavelength metrics in figure 4 show, the adjoint-inspired Si+SiN DOEs indeed cover a wider range of the different device metrics than the limited pure inverse-designed Adj. Si+SiN DOEs (of which the devices need multiple iterations of FDTD simulations, and are consequently more expensive to generate than a single parameterized device of an adjoint-inspired DOE). We have performed a similar analysis on the simulation results we obtained for the other dielectric stacks studied in section 2.6.1, but for brevity, we have only included the minimal IL values per DOE type and wafer type in table 1. These additional simulations prove that inclusion of SiN in the vertical GC design improves performance for adjoint-inspired devices for different c-Si etch depths and SiN thicknesses, although, in contrast to the Adj. Si+SiN DOEs, the benefit of having a patterned SiN layer over an unpatterned SiN slab might be negligible (⩽0.1 dB).

Five-die testing of all DOEs
In this section we investigate which wafer splits have optimal device performance (section 3.1), discuss the differences between different DOE types on the two best wafers (section 3.2) and we conclude by comparing the bivariate relationships of the most important performance metrics of the Si+SiN DOE types with our simulations as well as prior experimental results for pure c-Si DOEs (section 3.3).

Impact of the fabrication conditions of each wafer on the best IL measurement
In table 2, we include information obtained from the foundry's fabrication report of these wafers, combined with a summary of the 5-die measurement data obtained on these wafers. For each of the five dies, we measured the same set of 1746 devices, and, as different GCs might have different optimal alignment wavelengths, we used a fixed set of four different alignment wavelengths (1290, 1300, 131 and 1320 nm) for each device, with the exception of wafer W07 for which we used only three alignment wavelengths (1300, 1310 and 1320 nm). It takes roughly 1350 minutes to test all these devices on a die, so almost 5 days for a full 5-die measurement. These measurements were performed with an anti-reflection coated fiber to reduce the reflection ripple due to the air-cavity between the fiber facet and the wafer top surface. We report both the best IL per wafer, and the DOE to which the device with this best IL belonged. These numbers should be interpreted cautiously, and cannot directly be compared yet with the median 1.82 dB IL obtained for single etch devices on a pure c-Si wafer in [36] as that number was obtained for a full wafer measurement, i.e. including all dies (a measurement which we will discuss in section 4). Nevertheless, these initial results already seem to suggest that the DOEs on these c-Si/SiN wafers have better performing devices, and that the best performing device is designed with a SiN overlay. Moreover, the best devices often belong to the , which might indicate that the traditional inverse design method is less robust for this more complicated fabrication process than the adjoint-inspired method. In contrast, for the pure c-Si wafer measured in [36], the best performing device was obtained in the inverse designed DOE. Finally, we observe that the best device DOE does not always correspond to the SiN thickness of the wafer (e.g. for W07, the best device is part of Adj.-Insp. 200 nm, whereas the SiN thickness is 600 nm-however as explained in section 2.5, part of this is due to suboptimal DOE selection for the Adj.-Insp. Si+SiN DOEs). Another take-home message is that the wafers with thicker SiN (600 nm) obtain better IL, and Si lithography dose 2 works best when combined with a 150 nm partial remaining Si thickness after etching.

Detailed performance analysis on two best wafers
Subsequently, we experimentally confirmed that the default geometry design for the focused GC's aperture (cf section 2.3) is indeed optimal. Furthermore, for wafer W07, as a subset of these 5-die measurement results, we show in figure 5(a), for all the inverse designed GCs different device performance metrics related to IL, bandwidth, reflection etc. For the reflection 6 , we report both worst (smoothened) GC-fiber reflection within the 1 dB band as well as the R1 reflection metric introduced in [36], which is an approximation of the average in-band in-waveguide reflection of the GC. On the x-axis, we include the etch depth used in the device design optimization procedure (which might be different from the actual etch depth on the wafer). Here, it can again be seen, that, for the same wafer, the measurements for the DOEs which do include SiN include the best IL performance values. However, in some cases, devices designed for 200 nm thick SiN perform slightly better than devices designed for 600 nm thick SiN. Moreover, we notice that devices can be obtained with >20 nm 1 dB-bandwidth, and moderately low reflection metrics. Specifically, for the reflection, we obtain values which are <−10 dB, but still > − 20 dB. These values are hence stronger than the reflection one can obtain with traditional single-polarization GCs which are not perfectly vertical [5,6,11,41]. Additionally, by comparing our reflection simulations with recent results on certain perfectly vertical GCs based on SWL engineering in single-layer dual etch fabrication processes, the latter devices might also result in lower reflections as long as the etch misalignment and etch depths can be well maintained in the process [42].
In figures 5(b)-(c), for wafer W07 and W24, respectively, we show for all the DOE types included in the 5-die measurement data the raw distribution of the IL (which might include failed alignment attempts for certain devices). Again, we observe that the best devices without SiN overlay perform worse than the best devices with such patterned overlay. In fact, the ∼1 dB improvement of minimal IL for W07 and the ∼0.8 dB improvement for W24 for the DOE types with overlay compared to the one without is promising as it is better than the ∼0.5 dB theoretic prediction from [11]. However, as mentioned previously, these 5-die numbers need to be interpreted with caution as they might be sensitive to outliers. Consequently, we will gather more statistics in section 4 when discussing the full wafer measurements of the best performing devices to confirm this trend. In addition, in agreement with the discussion of table 2, the device

Experimental bivariate relationships between the performance metrics for the Si+SiN DOE types
As a last analysis result of the 5-die measurement data, to highlight the experimentally obtained bivariate relationships between the most relevant performance metrics, we represent measurement data for W07 in a similar scatter plot matrix as the one obtained for simulation data in figure 4 (figure 6). Here, as previous data representations already illustrated that pure c-Si DOEs have lower best-IL compared to Si+SiN devices, we have only included the Si+SiN devices. In general, we observe an equally broad coverage of the performance metrics (e.g. center wavelength range, 1 dB bandwidth, etc) and similar degrees of correlation by the adjoint-inspired DOE types with Si+SiN as was obtained in prior experimental work on pure c-Si adjoint-inspired vertical GCs [36], but containing devices with on average a better IL. In line with some of the previous figures, we notice that the best IL of our Adj.-Insp. data clouds cover a broader range of specs than the pure Adj. Si+SiN DOEs, allowing for a posteriori device selection in these DOEs based on the different figure of merits and constraints that are relevant to a broad set of applications [36]. In contrast to the simulation results in figure 4, we do not observe a significantly better performance of the Adj. Si+SiN 600 nm DOE compared to the other DOE types. We obtained similar results for W24, albeit with a global ∼20 nm shift of the DOEs in their center wavelength towards lower wavelengths.

Full wafer-level testing of shortlisted devices
In the previous section, we performed 5-die measurements on six wafers with different processing conditions, including splits on the c-Si grating etch depth, and SiN layer thickness. In this section, we analyze the full WLT data on the two wafers on which we obtained a best IL measurement of 1.05 dB: W07 and W24. For this purpose, we down-selected a list of 40 devices out of the initial list of 1746 devices. We selected the same 40 devices for both wafers. The resulting more extensive set of measurements on (unless otherwise mentioned) 66 dies (compared to the initial 5 dies) provides us with more statistics on the performance of the devices, allowing a fair comparison with the full WLT data obtained for pure single layer vertical GCs in [36]. Similar to section 3, we have performed alignment attempts for four different alignment wavelengths for both wafers: 1290 nm, 1300 nm, 1310 nm, 1320 nm.

Example data and relative performance of different DOE types
In figure 7, we report important device performance metrics such as band peak (i.e. IL), 1 dB-band center wavelength, band 1 dB (i.e. 1 dB-bandwidth) and R1 for a subset of 10 typical devices out of the initial list of 40 pre-selected for WLT. Note that the exemplary spectrum of a die measurement shown in figures 1(d)-(f) is the Adj. Si+SiN 200 nm device of W07 with device index 3.
For both wafers, using the measurement data for the best alignment wavelength per device, we can confirm the conclusion from the 5-die measurement that devices with SiN features perform better in IL than devices without such features. Moreover, we again observe that good IL can be obtained both for parameterized adjoint-inspired devices (Adj.-Insp. Si+SiN), as well as devices designed with the 'pure' adjoint method (Adj. Si+SiN). Just like in the 5-die measurements, we also notice that optimal performance of IL can be obtained for devices that were nominally designed for other SiN thicknesses than the actually fabricated SiN thickness. For instance, the Adj. Si+SiN device shown in figure 7, was nominally designed for a 200 nm SiN thickness, whereas both W07 and W24 had a target nominal 600 nm thickness for the SiN layer (actual SiN thickness was 591 nm). Furthermore, we observe R1 (1 dB band) values ⩽−10 dB, which is in line with the estimates of the within-band reflection based on the 5-die measurements.
While we restricted the data included in figure 7 to avoid overloading the figure, unless otherwise mentioned, the remaining discussion will be based on experimental data of all 40 measured devices. For wafer W07, for the best SIN device, we obtain a 1.33 dB (1.34 dB) median IL for alignment wavelength 1300 nm (1310 nm), and a similar optimal median IL for the Adj.-Insp. Si+SiN devices, which is ∼1.2 dB better than the corresponding median IL for the best performing Adj.+Si (pure c-Si) device at its optimal alignment wavelength. On the other hand, for wafer W24, we obtain a median 1.29 dB IL for the best Adj.-Insp Si+SiN device, and 1.35 dB for the best Adj. Si+SiN device, both at optimal alignment wavelength 1290 nm (i.e. more off-target from the intended 1310 nm design wavelength), while the best median IL for pure c-Si device designs are ∼2.2 dB for alignment wavelengths 1290, 1300, and 1310 nm.
Additionally, we have added the best median IL per DOE type for W07 and W24 in table 1. Given the gauge R&R of our setup, both wafers obtain a similar best performance. In contrast to the simulations in this table, we do not observe an IL advantage for Adj. Si+SiN 600 nm over the other Si+SiN DOE types.

Comparison of new c-Si device measurements with previously published reference data
Based on table 1, it is noteworthy that the median IL of ∼2.5 dB (∼2.2 dB) for wafer W07 (W24) of the best performing pure c-Si designs on this wafer is worse than the median 1.82 dB IL for the best c-Si design previously reported in [36]. We believe the differences in fabrication (lithography dose, c-Si etch depth, inclusion of additional etch steps for other devices or inclusion of an annealing step to reduce sidewall roughness or not, changes in top oxide thickness, etc) of the two c-Si+SiN wafers compared to the pure c-Si wafer could lead to such IL deviations. Indeed, to exclude the effect of changes in IL due to changes in the measurement setup, we have measured wafer W09, a pure c-Si wafer without SiN overlay and, similar to results reported on in [36], we obtained a 1.92 dB median IL for a representative design of record at alignment wavelength 1300 and 1310 nm. The IL difference of ∼0.1 dB with [36] is within bounds of what can be expected for the gauge R&R of our setup, which we quantified for this measurement series to be 0.1-0.2 dB. As part of this additional WLT measurement of wafer W09, we also obtained 2.20 dB (2.18 dB) median IL for the same 'Adj. Si' design reported in figure 7 at alignment wavelength 1300 nm (1310 nm). Overall, we estimate a ⩾ 0.5 dB improvement in median IL performance for the best SiN compared to the best pure c-Si device design. This is slightly less than our initial estimate in section 3 based on the 5-die measurements, but in line with the predicted 0.5 dB improvement mentioned in [11] for similar device designs and slightly bigger than the difference obtained based on a comparison with the Adj. Si. simulations shown in table 1.

Robustness of best performing Si+SiN device design on each wafer
Moreover, as a measure of robustness, we notice that the (representative) interquartile ranges (IQR) of the IL box plots in figure 7 for the various best performing Si+SiN DOE designs (at optimal alignment wavelength) are on the order of ∼0.1 − 0.2 dB, similar to the IQR values of the experiments reported in table 1. We have confirmed this range to be similar to the IQR values of pure c-Si devices on the same W07 and W24 wafers (which did include the additional SiN processing steps). We obtained interquartile ranges of ∼0.05-0.1 dB for best devices on W09, the baseline wafer without a SiN layer. Consequently, the IL spread of the designs with SiN overlay does not suffer from significant decrease in fabrication robustness compared to the IL spread of the pure c-Si designs. In addition to the IQR values for the IL in figure 7 and table 1, we also report minimum, maximum, mean, standard deviation and 3σ for the IL distribution for the measurements (at optimal alignment wavelength) of the respective best devices of wafer W07 (i.e. the device used as exemplary measurement in figures 1(d)-(f)) and W24. Note that unfortunately one die measurement for W07 for this design failed to complete, so only 65 die measurement results are included in this distribution, but the impact on the distribution should be negligible. Similar to what can be seen for some of the IL distributions in figure 7, the W24 distribution suffers from outliers (due to failed alignment attempts), impacting the maximum and standard deviation of the IL. Given this sensitivity to outliers, for future reference, we also report IQR values for the center wavelength, bandwidth and R1 in table 3.

Role of SiN features in experimental IL improvement
As we have not included additional GC test structures with a simple wide SiN slab above the c-Si grating lines (i.e. structures similar to the ones discussed in section 2.6.2 for the simulation results in this table), it is challenging to pinpoint what the exact mechanism is that leads to experimental performance improvements for the Si+SiN devices. In the analysis of the simulation results in [11], it was highlighted how the inclusion of a patterned SiN layer that provides anti-reflection (AR) and light focusing can provide an advantage over the single-etch design. However, given that some of the best performing devices on certain wafer splits (e.g. SiN thickness 600 nm for both W07 and W24) were actually designs intended to function well on other splits (e.g. SiN thickness 200 nm) and given the fabrication deviations one might expect for one of the first c-Si+SiN fabrication runs offered by the foundry on their 300 mm wafer platform, we notice a certain lack of sensitivity to the precise SiN feature patterning in our experimental results. Consequently, one would be tempted to conclude that the biggest contribution to IL improvement is the capability of the center SiN slab to act as an AR-coating, with the influence of the intended additional focusing by the first and last SiN GC lines being less prominent than anticipated in simulations. However, by closer inspection of 9 representative devices with index 451-459 which were included in our full WLT for the Adj.-Insp. Si+SiN 200 nm DOE (some of which are shown in figure 7), a set of devices which consist of a factorial of 3 different center SiN slab widths and 3 different offsets between the c-Si and SiN pattern, we noticed that the best performing devices in this set are not necessarily the ones with widest SiN slab, nor earliest starting position of this SiN slab. This suggest that the detailed SiN patterning can still impact the IL amongst the lowest loss GCs in the DOE in a non-trivial way, and our current Si+SiN DOEs consequently seem to rely on a certain interplay between both the c-Si and SiN layer.

Beam angle and mode profile extraction
In addition to the WLT series reported in figure 7, we have performed both (1) beam angle and (2) Figure 8(c) shows an example experimental measurement of the mode profile, which is the result of the convolution from the Gaussian fiber mode with the mode of the 0 deg GC for a die on wafer W07. These mode profiles are qualitatively similar to the ones obtained for the pure c-Si 0 deg GC counterparts reported in [36].

Discussion and future work
The best observed median ∼1.3 dB IL in these experiments is ∼0.7 dB worse than the best simulated IL in table 1, but it still puts the IL of these devices at equal footing with traditional single-etch GCs which are not perfectly vertical [3,11]. For perfectly vertical GCs, it outperforms a previous record experimental 1.5 dB IL (single device measurement, not WLT) obtained using a dual etch device proposed in [16] for 1550 nm. Note that the latter device did not require a SiN layer, but had critical dimensions of 30 nm, which is in contrast to our 193 nm DUV lithography compatible devices. In addition, the device performance would be more sensitive to misalignment of the two different etch depths than our devices are impacted by misalignment between the c-Si and SiN etch steps. As the current fabrication run was one of the first iterations of the foundry's new Si+SiN 300 mm process, we expect that future improvements in process control might result in additional improvements in IL for the dual layer perfectly vertical GCs. Additionally, inclusion of sub-wavelength features in the c-Si in a way that respects the minimal feature sizes imposed by 193 nm immersion DUV lithography, is expected to reduce coupling losses even more [19,26,28]. Indeed, a recent SWL dual etch design for perfectly vertical coupling at 1550 nm obtained in simulation 0.35 dB IL [21], which is 0.2 dB better than the best simulation results obtained for the DOEs reported in table 1 which were restricted to a single c-Si etch (in addition to another SiN etch with larger minimal feature size).
Finally, while we believe our design methodologies and measurement procedures will be useful for other applications such as perfectly vertical TM GCs, or perfectly vertical polarization-diversity GCs [18], we believe the reflection in such devices will, similar to the reflection present in TE device presented in this paper, remain a key priority for future research. Indeed, while our TE devices have significantly lower reflection than what we would obtain in naive designs for perfectly vertical GCs such as single-layer uniform GCs, as is evident from the reflection-induced ripple in the transmission spectrum of one of our best performing devices shown in figures 1(d)-(f), reducing reflection with either novel design techniques or more advanced fabrication capabilities is also key to make perfectly vertical GCs attractive for applications with stringent requirements on reflection such as optical interconnects. Admittedly, the reflection data in this paper needs to be interpreted cautiously, as the WLT-procedure utilized here is optimized to efficiently extract IL, center wavelengths and bandwidths of our DOEs in an automated way, but the measurement procedure is known to be sensitive to spurious reflections such as an air-oxide interface at the top of the wafer. In contrast to WLT-procedures, such an air-oxide interface would be removed in measurement procedures performed in actual connector implementations due to the presence of oxide filling between fiber and GC. Nevertheless, the currently observed moderate reflections will still remain a deal breaker in many applications, and in future characterization of perfectly vertical GCs, it is strongly advised to include interferometric reflection test structures to monitor the in-waveguide reflection more accurately [36] and simultaneously adapt the GC designs to mitigate this issue.

Conclusion
In this paper, we experimentally validated two different design paradigms for perfectly vertical dual layer Si+SiN single-polarization GCs: GCs designed using inverse design based on the adjoint method (as proposed in our previous work [11]), and adjoint-inspired GCs (adapting the technique proposed in [36] to the dual layer stack). These devices were fabricated on a hybrid Si/SiN platform using scalable immersion DUV lithography on 300 mm wafers, with critical dimensions (60 nm) for the c-Si layer as well as relatively high aspect ratios (6 : 1) for the SiN features that can be considered state-of-the-art for the silicon photonics community [29][30][31][32][33][34][35]. For both design methodologies, using WLT, for the first time, we experimentally demonstrated record low median IL of 1.3 dB (with interquartile range of ∼0.1−0.2 dB) for perfectly vertical coupling for dual layer device designs that are compatible with volume fabrication, which is ∼0.5 dB better than our previously demonstrated single layer, single-etch c-Si alternative [36]. We observe a comparable fabrication robustness for the dual layer devices compared to the pure c-Si designs (e.g. in terms of the spread of IL of the best performing devices). Furthermore, we obtained state-of-the-art 1 dB-bandwidths, combined with moderate reflection levels and center wavelengths in the O-band. The analysis of the different DOEs in this paper confirm previous simulation results [11] in which it was predicted that adding SiN to the layer stack for 0 deg GCs is beneficial compared to single-layer GC designs. However, in contrast to simulations, we do not observe a benefit in IL of the pure adjoint devices compared to the adjoint-inspired devices, which we expect to be related to fabrication deviations in the SiN layer. We expect that further IL reductions towards reproducible, DUV lithography compatible experimental sub-dB performance for perfectly vertical single-polarization dual-layer GCs can be obtained either by improvements in the fabrication of the SiN features or by inclusion of SWL features in the design (especially in the c-Si layer, where the minimal feature size is smaller). Here, from the use-case perspective, priority should be given to designs that can experimentally reduce the reflection compared to the devices presented in this paper.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.