Impact of device scaling on the electrical properties of MoS2 field-effect transistors

Two-dimensional semiconducting materials are considered as ideal candidates for ultimate device scaling. However, a systematic study on the performance and variability impact of scaling the different device dimensions is still lacking. Here we investigate the scaling behavior across 1300 devices fabricated on large-area grown MoS2 material with channel length down to 30 nm, contact length down to 13 nm and capacitive effective oxide thickness (CET) down to 1.9 nm. These devices show best-in-class performance with transconductance of 185 μS/μm and a minimum subthreshold swing (SS) of 86 mV/dec. We find that scaling the top-contact length has no impact on the contact resistance and electrostatics of three monolayers MoS2 transistors, because edge injection is dominant. Further, we identify that SS degradation occurs at short channel length and can be mitigated by reducing the CET and lowering the Schottky barrier height. Finally, using a power performance area (PPA) analysis, we present a roadmap of material improvements to make 2D devices competitive with silicon gate-all-around devices.

CMOS technology has advertently followed Moore's law of device scaling for the past 50 years to achieve higher transistor density, higher speed and power improvements. A significant part of this device scaling, especially for the planar Metal-Oxide-Semiconductor-Field-Effect-Transistor (MOSFET) was achieved by scaling the gate length 1 . This scaling is reaching its limits as short channel effects (SCE) significantly degrade the device performance. To partially overcome SCE, the tri-gate (FinFET) structure has been introduced 2 . For future technology nodes, the gate-all-around nanosheet FET, which sandwiches thin layers of silicon channel between multiple gates, is expected to provide additional improvements. Both configurations enhance the electrostatic control over the channel and allow for further gate length scaling. However, it has been reported 3 that the required silicon channel thickness scaling below 10 nm severely degrades the carrier mobility due to increased surface-roughness scattering. In this context, two-dimensional (2D) semiconducting materials such as transition metal-dichalcogenides (TMDs) are considered to be ideal candidates due to their naturally passivated surface and ultra-thin body (1 monolayer MoS 2 ~ 0.65 nm), providing excellent gate-control and enhanced transport 4-7 . However, since many studies are performed with manually exfoliated flakes and collecting large datasets is very labor-intensive, there has been a strong focus on only selecting top performing devices, at the cost of less device understanding. Until recently, only a few TMD studies [8][9][10] have focused on devices fabricated using large area grown films. Especially for device scaling 11 , a statistically significant set of data is still lacking.
Therefore, we carry out a study of the impact of geometrical scaling on an extensive data set of large-area grown tri-layer MoS 2 MOSFETs (1300 devices). We investigate the impact of scaling the channel length (L ch ) and width (W ch ), contact length (L cont ) and effective oxide thickness (EOT) on various device performance metrics such as the on-and off-current (I on , I off ), contact resistance (R c ), subthreshold swing (SS), interface trap density (D it ) and threshold voltage (V T ). We demonstrate that scaling the contact length down to 13 nm has no impact on the device performance. This confirms that carrier injection occurs exclusively from the edge of the metal directly into the thin TMD channel, which is in line with our TCAD simulations. Further, using our large data set, we make a detailed assessment on the scaling trends of SS and V T with device dimensions. We identify the variation in the number of MoS 2 layers in the channel and contact regions as a possible source for SS degradation and V T variability for ultra-scaled TMD MOSFETs. Such insights are crucial for device understanding and enables device architectures such as double-gate 12 or stacked TMD FETs to outperform Si FETs 13 . This article is an extension of our previous work presented at IEDM 2019 10 .  Scaling of on-and off-state currents. From the representative transfer characteristics in Fig. 1g, we observe that the off-state current significantly increases as L ch is scaled, as a result of a loss of gate control. Accordingly, we extract the minimum current in the entire back gate sweep (I min ), and we observe that it is the same for both oxides and lower than the noise floor of the tool (< 1 pA). However, when comparing the I off in the scatterplot Fig. 2a, which is extracted at a fixed displacement field of 0.4 V/nm below V T,CC (i.e.,|V GS − V T,CC |/CET = 0.4 V/nm ), we note that the HfO 2 sample exhibits higher I off compared to the SiO 2 sample. This suggests that the subthreshold swing is limited by the high interface trap density (see Section D).
We also note that for both oxides, I off degrades with smaller L ch . This is mainly due to SS degradation observed for short L ch devices, and will be further discussed in Section E. Next, we evaluate the I on at a fixed charge density (n s ) of 10 13 cm −2 and do not observe any difference between the 50 nm SiO 2 and 12 nm HfO 2 samples (Fig. 2a). This indicates that the carrier transport in the MoS 2 channel is predominantly limited by charged impurities 14 in the MoS 2 or at the interfaces, and not by remote phonons 15 in the gate oxide. showing I on extracted at n s = 1e13 cm −2 and I off at a fixed displacement field of 0.4 V/nm below V T,CC. for V DS = 1 V. I on for the 50 nm SiO 2 and 12 nm HfO 2 devices overlap indicating no impact on low-field mobility and contact barrier. I on roughly scales as 1/L ch for L ch > 500 nm and saturates for L ch < 50 nm. I off is higher for HfO 2 compared to SiO 2 . (b) I D -V DS for L ch = 500 nm shows linear triode regime and saturation at high V DS . The dashed line follows the current at V DS = V OV. While the onset of saturation follows the V OV at low V GS , it saturates at V DS = 2.4 V for high V GS . The saturation current roughly scales V OV 2 and V OV 1 , at low and high V GS , respectively. (c) I D -V DS for L ch = 30 nm shows non-linear triode regime due to Schottky contacts and saturation at high V DS . Saturation current follows a similar trend as L ch = 500 nm but V DS at onset of velocity saturation is reduced to 1.4 V. (d) Conduction band profile for L ch = 30 nm device with Schottky contacts shown for low and high V DS . The Fermi-level at the source and drain are indicated by E FS and E FD , respectively. The Schottky barrier is shown as the abrupt potential change at the contact-channel interface. At low V DS, I DS is determined by Schottky contacts. At high V DS , though the potential drops significantly at the source contact, velocity saturation or pinch-off near the drain determines the I D characteristics.  Fig. 2a. In the long-channel limit (~ L ch > 500 nm), the I on increases roughly proportional to 1/L ch and the device operates in the triode region (illustrated in Fig. 2b for the 12 nm HfO 2 sample and L ch = 500 nm) i.e. gate-overdrive V OV (= V GS − V T ) > V DS for both oxides. The drain current also exhibits strongly linear dependence with V DS in the triode region (Fig. 2b), suggesting that the channel resistance is dominant for this L ch and beyond. We also extract a low-field-effect mobility of ~ 15 cm −2 /V.s (inset of Fig. 3c) using the transfer length method (TLM) for both the samples with 12 nm HfO 2 and 50 nm SiO 2 . At higher lateral electric field (higher V DS ), I D saturates (Fig. 2b), and the saturation current scales quadratically with V OV (here V T,CC = −0.4 V) due to channel pinch-off near the drain. However, for the highest V OV (~ 2 to 2.4 V), the saturation current scales roughly linear with V OV , indicating that it is limited by saturation of drift velocity at high lateral-field 16 (F LATERAL > 5 V/μm).
In the short-channel limit (~ L ch < 50 nm), the dependence of I on on L ch saturates (Fig. 2a). Accordingly, in the output characteristics for L ch = 30 nm (Fig. 2c), we make two observations; (1) super-linear I D for V DS < 0.4 V and (2) saturation of I D for V DS > 1.4 V. The distinct super-linear dependence of I D with V DS (Fig. 2c) suggests that the Schottky contacts at the metal-MoS 2 interface limit the current even though the bias conditions (V OV > V DS, here V T,CC = −0.3 V) ensure that the channel is continuously accumulated with electrons. At higher V DS , I D saturates similarly to the L ch = 500 nm device. The current at the onset of saturation is roughly proportional to V OV 1.5-1.7 and V OV 0.8-0.9 for low and high V OV , respectively, closely following the long-channel characteristics. This indicates that while contact resistance dominates at low V DS , velocity saturation or pinch-off near the drain determines the current at high V DS .
We can further understand both these observations from the simulated conduction band profile of L ch = 30 nm device (Fig. 2d) for low and high V DS. In the linear regime (V DS = 0.2 V and V OV > V DS ), the drain-source potential (c) R total/2 (at n s = 1e13 cm −2 ) versus L ch show saturation below L ch = 50 nm due to contact resistance. Upper limit for R C is obtained as median R total/2 for L ch = 30 nm. Median R C values of 3 kΩ.μm with best performers at 2 kΩ.μm are obtained. (inset) TLM fit of R total/2 (at n s = 1e13cm −2 ) versus L ch gives R c = 2.7 kΩ.μm and field-effect mobility = 15 cm −2 /V.s (d) R total/2 versus n s for L ch = 30 nm at V DS = 1 V of 8 devices. R C significantly reduces at n s = 2e13 cm −2 due to better carrier injection into the accumulated channel. www.nature.com/scientificreports/ is predominantly dropped across the reverse-biased source and forward-biased drain Schottky contacts. With increasing V DS (higher lateral field), the transmission probability across the Schottky contacts increases rapidly, especially across the reverse-biased source, giving rise to the super-linear dependence of I D with V DS . At even higher V DS (V DS = 1.2 V), the electric field in the channel near the drain is large enough to cause either pinch-off at low V OV or saturation of the carrier drift velocity at high V OV . Then, this results in saturation of the current. Figure 3a shows that I on (@ n s = 10 13 cm −2 ) does not degrade as L cont is scaled down to 13 nm. This agrees with TCAD simulations 10,17,18 that predict contact edge injection of carriers for 1-3 layers of MoS 2 channel. This observation holds true for three different L ch (30 nm, 100 nm, 500 nm) over a wide range of L cont (500 nm to 13 nm) and for varying lateral field (V DS = 0.05 V, 1 V). In all three cases, as predicted, we do not observe any systematic degradation of I on by scaling down L cont from 500 to 13 nm. Even for the shortest L ch = 30 nm, where the channel resistance is negligible and the device is Schottky contact limited (I D -V DS is super-linear at low V DS in Fig. 2c), the contact resistance is independent of L cont . Moreover, the electrostatic properties of the device are also unaffected by scaling down L cont as can be seen in Fig. 3b from the trend of SS CC and V T,CC (@V DS = 1 V) with L ch for two extreme contact lengths. The SS degradation and V T roll-off with shorter L ch are independent of the contact length. The insensitivity to L cont scaling also holds for other gate-oxides and charge densities (plots not shown). In summary, for 3 ML MoS 2 , the active region of MoS 2 under the metal contact where most of the electrons get injected (called the transfer length L T ) is at least below 13 nm. These results agree very well with our previous TCAD simulations with overlapping back-gate. For thin MoS 2 (1-3 ML), these predict L T smaller than the minimum simulated L cont of 2 nm (Fig S2). This is caused by the Schottky barrier (SB) at metal-MoS 2 interface, which depletes the MoS 2 underneath even at a high gate-field and prevents vertical electron injection. Therefore, injection is only allowed from the edge of the metal contact directly into the carrier-rich channel, which is also predicted in other work 18,19 . For thicker MoS 2 (more than 5 ML), the MoS 2 region underneath the contact is no longer depleted close to the oxide interface, and a longer section of the contact contributes to carrier injection [20][21][22] . In a top-gate-only configuration, the absence of gate field under the contact would cause the vertical injection to become even more ineffective for both thick and thin MoS 2 channels. As a result, the contact length can also be downscaled for top-gated devices without any performance penalty ( Fig S2). Moreover, reduction of contact barrier or MoS 2 sheet resistance under the contact does not increase the L T for 1-3 ML MoS 2 as the oblique trajectory still provides the least resistive path for carrier injection (Fig S2). However, such improvements could increase L T for thicker MoS 2 where a substantial carrier injection happens under the contact 19 .

Contact length scaling.
In other work 21,23-25 , transfer lengths of 80 nm to 630 nm have been calculated using the transfer length method (x-axis intercept), but those values are in contradiction with our results. As argued elsewhere 26 , this method should not be used for thin TMD layers and Schottky contacts. The Schottky barrier fully depletes the TMD below, therefore the sheet resistance below the contact and in the channel are not the same, which is a requirement of the transfer length method. However, the transfer length method can still be reliably used for mobility calculation, because it does not have this requirement of identical TMD sheet resistance in the channel and below the metal.
Contact resistance extraction. As we found in Section A that devices become more contact dominated as L ch is scaled, we now take a closer look at the value of the contact resistance. We extract the contact resistance (R c ) directly as half of the total device resistance (R tot /2) for devices with the shortest L ch = 30 nm, without any need for extrapolation like in the TLM method. By considering R c ~ R tot /2, an upper limit is obtained for R c , as it assumes negligible channel resistance. Figure 3c shows a plot of R tot /2 at a charge density of 10 13 cm −2 vs L ch . For L ch < 50 nm, the R tot /2 saturates, and we obtain a median Nickel-MoS 2 R c ~ 3 kΩ.μm (at n s = 10 13 cm −2 ) , which is in good agreement with R c extracted using TLM (inset of Fig. 3c). Our R C values are comparable to the state-of-the-art devices which have been demonstrated with Au 20 or Indium 27 contact metals. For increased V OV , the contact resistance further drops due to better carrier injection into the accumulated channel, and we obtain R c ~ 1.2-2 kΩ.μm @ n s = 2 × 10 13 cm −2 (Fig. 3d). For even higher carrier densities (compare n s = 2 × 10 13 cm −2 to 2.7 × 10 13 cm −2 ), R c no longer improves significantly. Significant device-to-device variation in contact resistance is observed, possible due to polymer residues between the contact metal and the MoS 2 , which were not completely removed after the transfer and contact lithography steps of the fabrication flow. Figure 4a shows that the subthreshold swing SS CC obtained at V DS = 0.05 V for different L ch, improves with thinner back-gate oxide due to better gate control of the charge in the channel. Consequently, we achieve the best subthreshold swing for the devices on 4 nm HfO 2 substrate (Fig S3) with median SS min = 90 mV/dec and 110 mV/dec (at V DS = 0.05 V) for L ch = 50 nm and 30 nm, respectively.

Long channel electrostatics and D it extraction.
In the long-channel limit i.e., L ch > 1 μm, SS CC saturates to a constant median value of 80 mV/dec, 150 mV/ dec, 1800 mV/dec for 4 nm HfO 2 , 12 nm HfO 2 , and 50 nm SiO 2 respectively. This is determined by the charging of MoS 2 /oxide interface and channel defects (60° grain boundaries 28 , and point defects 29 ), for which we calculate a trap density (D it,min ) of 4.5-7 × 10 12 cm −2 eV −1 from SS min . This range of D it value is roughly similar across the different dielectrics. We also confirm this D it value using multi-frequency C-V measurements of TiN/HfO 2 / MoS 2 MOScap 30 , where we obtain an acceptor-type trap density of 3.2-6 × 10 12 cm −2 eV −1 with energy levels near the midgap.
From C-V measurements, we find that the MOS capacitance is systematically lower than the target oxide capacitance due to exposure to water and/or atmospheric carbon during the wet transfer process from the sapphire template to the target substrates. Figure 4b shows how the maximum accumulation capacitance (C acc ) . We calculate that the effect of quantum capacitance due to the limited density of states in MoS 2 , and the effect of charge centroid being a few angstrom away from the interface, are insufficient to account for this 1 nm difference. As the MIMcaps are not exposed to water or polymer during the fabrication, Fig. 4b shows the difference between the CET and EOT values can be explained by a 0.4 nm thick layer of water or hydrocarbons adsorbed from the ambient, or a combination thereof. In the future, we expect dry transfer in a controlled ambient will lower the CET, closer to the nominal EOT.
Short channel electrostatic degradation and variability. In the short-channel limit, i.e., L ch < 100 nm, Fig. 4a shows a degradation of median SS CC but also increased scatter (SS CC at V DS = 1 V in Fig S4). A similar trend is also seen for SS min (Fig S3). We hypothesize that the increased median and scatter could both be caused by the Schottky contacts, where the median SS degradation with shorter L ch is related to the relative increase of depletion regions from the Schottky contacts, while the scatter could be due to the variation in Schottky barrier height 31 (SBH) induced by the non-uniform thickness of the MoS 2 , seen in the AFM image in Fig. 1a. We first verify the hypothesis of degraded median SS for shorter L ch by comparing representative experimental SS versus I D curves to simulations in Fig. 4c. We consider full SS-I D curves instead of extracting SS at a single   Systematically, the C acc is lower than C ox corresponding to an additional 1 nm CET over the measured EOT. Simulations show this is caused by the quantum capacitance Cq (MoS 2 having lower DOS than metal), the impact of the charge centroid (CC) further away in MOS than MIM, and additionally due to 0.4 nm of water or carbon residues stuck at the HfO 2 /MoS 2 interface during transfer. Qualitative comparison between (c) simulated and (d) experimental SS versus log (I D ) for different L ch. The simulated SS is for a uniform 3 monolayers MoS 2 with SBH = 0.45 eV. Two transport regimes at the contacts-thermionic emission and tunneling through the SB are identified. In the thermionic regime, the relative increase of field in the channel from the source/drain Schottky contacts degrades gate control for short L ch devices. In the tunneling regime, the nearly equal tunneling lengths for the different L ch results in a similar but degraded SS compared to the thermionic regime. www.nature.com/scientificreports/ current level to understand the injection mechanism in a wider operation range. The simulations are performed for a SBH = 0.45 eV and uniform 3 ML MoS 2 channel. We observe two different regimes for SS for both the simulated and experimental data. In the first low-current regime (I D < 1e−9 A/μm), the current is limited by the thermionic emission of carriers from the metal into the channel. Here, the barrier for electrons consists of the highest position of conduction band edge inside the channel determined by the gate-bias. In this low-current regime, SS is determined by the change in the conduction band edge with gate-bias. As discussed in section D, the lower limit for SS (which corresponds to SS min in Fig S3), is defined by the interface trap density. The degradation of SS min for short-L ch devices is due to the electrostatic potential of the source and drain metallurgical junctions influencing the channel potential and degrading the gate control. This is illustrated in Fig S5 where the conduction band energy is flat over most of the device for L ch = 100 nm, while it is lowered for L ch = 30 nm with the region of maximum barrier reducing to a small portion near the center of the device. Note that this effect is similar to conventional MOSFETs. The second regime (I D > 1e−9 A/μm) is reached when the conduction band in the channel is lowered further, and carriers can efficiently tunnel through the SB (Fig S6). Here, the thermionic component over the barrier saturates and the tunneling path length determines the current. Because it continuously changes with higher V GS , the SS is worse than the first regime. Correspondingly, in the experimental devices, the SS CC extracted at I D > 1e−8 A/μm (for L ch < 100 nm) shows a higher value than SS min and stronger degradation with L ch . The SS for a given I D also becomes nearly independent of L ch , because the tunneling path length depends only on the gate voltage and the thicknesses and dielectric permittivities of the TMD 32 and oxide, for the low lateral electric field (V DS = 0.05 V). This is illustrated in Fig S6 where the conduction band energy and tunneling rate are plotted along the edge carrier injection path for L ch = 30 nm and 100 nm, showing no significant difference. With further reduction in SBH, the SS value in the second regime improves, reaching closer to the thermionic limit of the first regime.
We study the increased SS scatter for short L ch seen experimentally, using simulations of devices with different uniform MoS 2 channel thickness and SBH. Figure 5a shows the simulated SS value for two different SBH (0.45 eV, 0.75 eV) and three different uniform thicknesses (1, 3 and 5 layers) of MoS 2 for L ch = 30 nm. Similar to the above case, we note two different regimes for SS irrespective of the barrier height. For the first regime of low I D (< 1e−8 A/μm for SBH = 0.45 eV and < 1e−11 A/μm for SBH = 0.75 eV), the SS is determined only by thermionic emission over the channel barrier. Therefore, the SS is independent on the channel thickness. However, the SS degrades for SBH = 0.75 eV compared to 0.45 eV, because the higher Schottky barrier field penetrates deeper into the channel. For the second regime of high I D (> 1e−8 A/μm for SBH = 0.45 eV and > 1e−10 A/μm for SBH = 0.75 eV), the SS is dependent on the tunneling length which is sensitive to the thickness of the semiconductor among other parameters 33 . Subsequently, the gate control over the Schottky barrier, and hence the tunneling length, reduces with thicker MoS 2 , resulting in poor SS for the 5 ML MoS 2 (Fig S7). In agreement with this observation, we also note that the difference in SS between the layers is more pronounced for the higher SBH of 0.75 eV.
In our experiments, we have even more variability due to non-uniform thickness within a single device. Even for the smallest functional device footprint (L ch ~ 30 nm * W ch ~ 200 nm), we always have a high probability (~ 70%) of having a mixed device i.e., regions of 3, 4 and 5 layers of MoS 2 within the same device. This is illustrated in Fig. 5b where the representative AFM (Fig. 1a) image of the material was used to compute the probability of fabricating devices with different dimensions on only 3 (or) 4 (or) 5 or a mix of those layers. These mixed-thickness devices, together with the associated SBH variations, would result in non-uniform gate control and large scatter in the SS values of experimental devices. Also, note that the grain size and defects in the closed layers (1-3 ML) could additionally impact the device variability.
Threshold voltage control. We analyse V T control for decreasing channel length, and Fig. 5c shows that there is no significant median V T roll-off at V DS = 0.05 V. With a higher V DS = 1 V, we notice a V T roll-off of about 200 mV from L ch = 500 nm to 30 nm. We attribute this roll-off to the higher lateral electric field across the reverse-biased Schottky contact, because V DS is fixed at 1 V for all L ch . This higher electric field allows for increased carrier injection in short channel devices, which lowers V T . This roll-off could be mitigated by improving the gate control through gate-oxide scaling, or by reducing the amount of defects at the MoS 2 /oxide interface. V T control for decreasing channel width is also shown in Fig. 5c, and no systematic impact is seen as W ch is scaled from 1 μm down to 200 nm. However, we note that the narrow devices (W ch = 200 nm) show higher V T variability than wider devices (W ch = 1 μm), especially at V DS = 0.05 V. This increased V T variability could be attributed to the higher probability of finding devices on discrete layers (Fig. 5b) for narrower channel compared to a wider channel where the devices are always mixed. Other sources of variability such as bias-temperature instability, non-uniformity of the MoS 2 grains etc. could also impact the V T variability and more dedicated experiments are required.

Benchmark, projection and conclusion
We present a benchmark chart (Fig. 5d) to compare the performance of our devices against flake and CVD 2D material FETs in literature [34][35][36][37][38][39][40][41][42][43] . We choose the peak of transconductance (g m,max ) measured at V DS = 1 V and SS min as the two metrics for comparison, similar to conventional Si transistors. The best corner is on the top-left since low SS min and high g m,max are desired. Our SiO 2 devices, owing to the thick EOT, provide low transconductance even for the shortest L ch devices. Scaling the EOT (12 nm HfO 2 and 4 nm HfO 2 ) and using an optimized process flow (see Methods), we gain both in transconductance and SS, achieving a R c < 2 kΩ.μm for Ni contact metal and D it < 5 × 10 12 cm −2 for a CET of 1.9 nm. We demonstrate the highest g m,max = 185 μS/μm at V DS = 1 V and a minimum SS of 86 mV/dec for 4 nm HfO 2. We also achieve I max = 400 μA/μm at V DS = 1 V and V GS = 4 V for our 12 nm HfO 2 samples (Fig S8). www.nature.com/scientificreports/ Despite the fact our 2D performance is among the best in literature, significant improvements are still needed to make 2D materials competitive with silicon channel devices for high-performance logic applications. Therefore we propose a roadmap using the Power Performance Area (PPA) metric for technology comparison in Fig. 6. 2D-FET and silicon nanosheet technology are compared using an inverter-based ring oscillator circuit, where each device consists of 4 vertically stacked sheets with scaled L g = 14 nm and gate-all-around structure, corresponding to the imec 2 nm node 44 . All devices are retargeted to an I off = 2 nA at V dd = 0.7 V and the invertercircuit area is kept the same for fairer comparison between technologies. Starting from the baseline case (A) where experimental channel and contact parameters are assumed, the performance strongly improves in (B) when the Schottky barrier height is reduced. In (C), improvements to the 2D channel mobility results in higher ring-oscillator operating frequency compared to silicon, owing to superior electrostatic control of the 2D devices at shorter gate lengths. In (D), the ideal performance is simulated with more aggressively optimized material parameters.
In conclusion, we have scaled down the different device dimensions of CVD-grown MoS 2 FETs and demonstrated g m,max = 185 μS/μm and SS min = 86 mV/dec which are among the best in literature. Using our large dataset, we systematically identified the key obstacles to be tackled to outperform silicon. First, we showed that scaling L cont for thin MoS 2 does not impact the short channel performance, which allows for an overall reduction in the device footprint and enables device and circuit level gate optimization 45 . Second, we identified that for L ch < 100 nm, the on-current is currently limited by high Schottky contact resistance (R c = 1-2 kΩ.μm) at low V DS , and by a combination of velocity saturation and the Schottky barriers at high V DS . Third, we identified that  Reducing the CET is therefore crucial to keep optimal electrostatic control of the thin channel. We established that a 0.4 nm layer of water or adsorbed hydrocarbons (or combination thereof) at the HfO 2 /MoS 2 interface is the root cause of a lower-than-expected CET. This value is consistent across different thicknesses of HfO 2 . Therefore, an optimized transfer process free of water and carbon is needed to enable gate stack scaling below 1 nm, and additionally allow upscaling to 300 mm-wafer processing. Finally, we have demonstrated using a PPA analysis that if the obstacles of Schottky contacts, gate stack scaling and mobility improvement can be tackled, MoS 2 FETs will significantly outperform silicon GAA FETs at the imec 2 nm node and beyond. Therefore, they are excellent candidates to continue logic scaling.

Methods
Device fabrication. For the device design, we use the back-gate configuration with top-contacts (Fig. 1b).
The fabrication flow is summarized in Fig. 1c. The MoS 2 is delaminated from the sapphire growth substrate using water intercalation and transferred to three different target substrates; (1) Si/50 nm SiO 2 (2) Si/50 nm SiO 2 /5 nm TiN/12 nm HfO 2 , or (3) Si/50 nm SiO 2 /5 nm TiN/4 nm HfO 2 . Before transfer, the target substrates are pre-cleaned using a solvent rinse, followed by an optimized forming gas anneal (FGA) or soft O 2 plasma, for SiO 2 and HfO 2 back-gate oxides, respectively. The active channels are patterned using PMMA mask and e-beam lithography, followed by reactive ion etching (Cl 2 + O 2 ) of MoS 2 . Source and drain contacts of different lengths (L cont ) with different channel lengths (L ch ) are subsequently defined on the active channel by another e-beam lithography exposure of ZEP520A-2 resist (ZEON Corp.), e-beam evaporation of 10 nm Ni, and metal lift-off in anisole. We ensure a low vacuum pressure < 10 -6 Torr while depositing the Ni contact metal. Finally, in a third e-beam lithography step, thicker Ni/Pd contact pads are lifted off.
TCAD calibration. All simulations 46 are performed in Sentaurus Synopsys Device. The low-field mobility (μ eff ) is calibrated from an experimental TLM fit shown in Fig. 3c and implemented under a constant mobility model. An estimate for D it is obtained from multi-frequency CV measurements as discussed in section D. An acceptor trap distribution uniform over the entire bandgap is assumed with D it = 3e12 cm −2 eV −1 . With μ eff and D it fixed by experiments, the Schottky barrier height is fitted to median transfer characteristics of L ch = 30 nm devices which are predominantly contact-limited. For the Schottky injection, the non-local tunneling model based on the Wentzel-Kramers-Brillouin approach is used. All the parameters used in the simulation correspond to their median values.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.