Highly Linear and Symmetric Analog Neuromorphic Synapse Based on Metal Oxide Semiconductor Transistors with Self‐Assembled Monolayer for High‐Precision Neural Network Computation

This work presents an analog neuromorphic synapse device consisting of two oxide semiconductor transistors for high‐precision neural networks. One of the two transistors controls the synaptic weight by charging or discharging the storage node, which leads to a conductance change in the other transistor. The programmed weight maintains for more than 300 s as electrons in the storage node are well preserved due to the extremely low off current of the oxide transistor. Ideal synaptic behaviors are achieved by utilizing superior properties of oxide transistors such as a high on/off ratio, low off current, and large‐area uniformity. To further improve the synaptic performance, self‐assembled monolayer treatment is applied for reducing the transistor conductance. The reduction of on current reduces the power consumption, and the reduced off current improves the retention characteristics. There is no noticeable decrease in simulated neural network accuracy even when the measured device‐to‐device variation is intentionally increased by 200%, indicating the possibility of large‐array operation with the synapse device.


Introduction
Nowadays, artificial intelligence is widely applied to numerous applications such as computer vision, [1] voice recognition, [2] natural language processing, [3] and autonomous driving. [4] Those applications demand repetitive training of synaptic weights in neural networks with a vast amount of data to achieve high accuracy. [5] However, as the neural network size grows rapidly, electronic systems based on conventional von Neumann architecture suffer from large delays and high power consumption field-effect transistors (MOSFETs) were proposed as an alternative approach. [16] MOSFET synapse devices showed excellent linearity, but they required continuous refresh operations due to the high off current, resulting in high power consumption. Other researchers have reported synapse devices utilizing oxide semiconductor transistors to improve the retention characteristics; [17,18] however, the previous results exhibited a limited number of weight levels (<10). Non-ideal synaptic characteristics, for example, device-to-device and cell-to-cell variations, poor endurance, and nonlinear/asymmetric weight modulation, in previous neuromorphic devices are known to reduce the inference accuracy of neural networks. [19,20] Therefore, developing a new synapse device that satisfies all the requirements of ideal synaptic behavior is needed for high-precision neural network applications.
In this work, we propose a novel analog neuromorphic synapse device utilizing two indium gallium zinc oxide (IGZO) transistors. Outstanding linear/symmetric weight update properties were achieved by balancing the electrode charging rate with the conductance change in the IGZO transistor. In addition, since the weight can be gradually modulated by an identical pulse train regardless of the current stored weight, our synapse device is optimized for gradient-descent programming, which is the most widely used training method in neural network systems. [21] Despite these advantages, however, IGZO transistors generally exhibit high output current (>1 µA), leading to high power consumption and complex analog-to-digital converter design in large-array configurations. [22] The output current of the IGZO transistor can be controlled by adjusting the IGZO composition ratio [23] and oxygen flow during deposition, [24] but these approaches may result in reliability degradation. In order to reduce the output current effectively without deteriorating the reliability, we introduced a self-assembled monolayer (SAM) that can form an ultrathin dielectric on the surface of the oxide semiconductor. [25] As the SAM was inserted between IGZO and source/ drain electrodes, we successfully reduced the output current. With this approach, our IGZO 2T synapse exhibited outstanding synaptic behavior compared to previously reported synapse devices.

Analog Synapse Device with Two IGZO Transistors
For highly linear and symmetric analog synaptic properties, we propose a synapse device consisting of two IGZO transistors, one for write and the other for read operations. The synaptic weight can be controlled by charging or discharging the storage node through the write transistor, which leads to a conductance change of the read transistor (Figure 1). After a programming operation is completed, the synaptic weight can be maintained due to the extremely low off current of the IGZO transistor (<50 fA). During the neural network inference, the weight stored as the conductance (G read ) is multiplied by the input voltage (V read ) applied to the drain of the read transistor that operates in a linear mode.

Self-Assembled Monolayer Treatment for Output Current Reduction
To achieve low-power and massive-scale array operation, the output current of the synapse device should be reduced. A common method of controlling the current density of the IGZO transistor is to change the atomic composition ratio of the semiconductor thin film and the annealing conditions. [23,24] However, such approaches have certain limitations as they require complicated process optimizations and most previous efforts have been targeted to improve carrier mobility rather than to reduce it. To decrease the transistor current without affecting the device's reliability, we applied a SAM treatment on the surface of IGZO. We utilized an alkyl-phosphonic acid SAM with different alkyl chain lengths: n = 10 (C-10) and n = 12 (C-12) as shown in Figure 2a.
The thickness of SAM varied from 10 (C-10) to 15 Å (C-12). [26] An alkyl-phosphonic acid SAM with a large bandgap forms an energy barrier between the source/drain electrodes and the IGZO channel, inhibiting the carrier transport ( Figure 2b). [27] As a result, the output current decreases as the alkyl chain length increases (Figure 2c,d). The positive V TH shift in C-12-SAMtreated samples resulted from an increased dipole moment in the longer alkyl chain length. [28,29] It has been known that a high current value limits the size when configuring a synapse device array with more than 100 rows and columns due to a significant IR drop. [18] Therefore, we set a goal to reduce the maximum output current to 1 nA for large-array configurability by utilizing C-12 SAM, which reduces the on-current of the unit transistor by 30%.

Electrical Properties of IGZO 2T Synapse Device
Our synapse device consists of two IGZO transistors with a bottom-gate structure and four input/output electrodes Figure 1. Operation scheme of IGZO-transistor-based synapse device. The conductance of the read transistor (G read ) is gradually controlled by charging/discharging the storage node through the write transistor. While applying pulse input to the gate of the write transistor, the drain/source potential was set to be high and low DC voltage for potentiation and depression, respectively. The stored charges at the storage node are maintained due to the extremely low off current of the IGZO transistor (<50 fA).
www.advelectronicmat.de as illustrated in Figure 3. The channel length of both read and write transistors was 20 µm, while the width of the read transistor was 100/80 µm for read and write transistors, respectively. The gate and source/drain overlap of the write transistor was 80 µm × 10 µm, and that of the read transistor was 100 µm × 200 µm. The gate electrode of the read transistor and one of the source/drain electrodes of the write transistor were connected to form a storage node, which is charged or discharged to control the synaptic weight. Since the storage node capacitance is determined by the gate/ source overlap in the read transistor, it was designed to have an extended overlap compared to the write transistor (Figure 3a,b). All the transistors were fabricated through a standard process, consisting of an Al bottom gate, SiO 2 gate insulator, IGZO channel, and Mo/Al source/drain electrodes. [30][31][32] In addition, a SAM treatment was applied to the IGZO surface to reduce the output current. The IGZO transistors exhibited a high on/off ratio (>10 7 ) and low off current below the detection limit (<50 fA). The current difference between the write and read transistors was proportional to the channel width (see Figure 3c,d).

Operation Scheme for Linear and Symmetric Weight Programming
As shown in the operation scheme in Figure 4, the bias condition was optimized for potentiation and depression. The input pulse applied to the write transistor gate was a 4 V peak-to-peak with an offset of −1 V to turn on and off the write transistor completely. The pulse width was initially set to be 20 ns, slightly larger than the minimum pulse width (15 ns) of a function generator (Keysight 33520B). Prior to potentiation, V SN was initialized to be 0 V. In Figure 4a, G read is plotted as the number of pulses with various V high from 1 to 3 V during the potentiation process, which was obtained from the storage node voltage (V SN ) calculated by Equation (1)

www.advelectronicmat.de
where C SN is the capacitance of the storage node, t n is the time that nth input pulse starts, t pulse is the input pulse width, and I charge is the charging current through write transistor. While V SN increases from 0 V, I charge decreases monotonically due to reduced V DS of the write transistor. This results in similar ΔG read per pulse combined with increasing transconductance. Accordingly, the potentiation exhibits an intrinsically linear characteristic while the potentiation slope can be adjusted by V high . During 1200 pulses, an average potentiation slope becomes steeper with an increase in V high : 0.74 nS per pulse with V high = 1 V, 1.48 nS per pulse with V high = 2 V, and 1.74 nS per pulse with V high = 3 V. During the depression, G read with various V low from −0.75 to −1.25 V was obtained (Figure 4b), assuming V SN was initially charged at 1 V. V low was set in a . Operation scheme of the synapse device for outstanding linearity and symmetry. Operational parameters (V high , T pot , V low , T dep ) and conductance range were optimized to maximize the synaptic device properties. a) The conductance of read transistor (G read ) as a function of the number of pulses with various V high during potentiation. The potentiation operation exhibits linear characteristics as decreased charging rate of the storage node and increased transconductance of read transistors compensate for each other. As V high increases, the V DS of the write transistor increases, which in turn enhances the charging current and the programming slope. b) G read as a function of the number of pulses with various V low during the depression. As the depression progresses, decreased discharging rate of the storage node and decreased transconductance of the read transistor result in a gradual decrease of the depression slope. As V low decreases, both V GS and V DS of the write transistor increase, which increases the discharging current and the programming slope. V high and V low were selected to be 3 and −0.75 V, respectively, to exhibit the most similar programming slope. c) G read as a function of the number of pulses with various pulse widths (T pot , T dep ) during potentiation and depression using optimized input voltages: V high = 3 V and V low = −0.75 V. As the pulse width increases, the programming slope also increases. To match the potentiation and depression slopes, T pot and T dep were selected as 21 and 16 ns, respectively. d) G read as a function of the number of pulses during potentiation and depression with optimized operational parameters: V high = 3 V, V low = −0.75 V, T pot = 21 ns, and T dep = 16 ns. The operating range, G read : 0.25-1.3 µS, was determined to have the most linear programming characteristics by suppressing the difference in ΔG read per pulse. With optimized parameters and conductance range, the programming slope for both potentiation and depression was 1.93 nS per pulse.

www.advelectronicmat.de
range that can completely turn off the read transistor ( Figure 2). G read was calculated from the V SN obtained by Equation (2) where I discharge is the discharging current through the write transistor. While V SN decreases from 1 V, I discharge gradually decreases as both V GS and V DS of the write transistor decrease. Unlike potentiation, decreased discharging speed at the storage node and decreased transconductance reinforce the decrease of ΔG read per unit pulse during the depression. Therefore, G read has intrinsically nonlinear characteristics at depression operation. To achieve linear and symmetric weight programming characteristics, we first selected V high as 3 V and V low as −0.75 V to make potentiation and depression operations have the most similar slopes. Then, G read as a function of the number of pulses with various pulse widths from 16 to 24 ns during potentiation and depression operation was obtained by using Equations (1) and (2) (Figure 4c). As the pulse width increases, the programming slope increases, so symmetrical programming can be achieved by setting an appropriate pulse width. Before setting pulse widths, the conductance range was determined to be 0. 25

Potentiation and Depression Characteristics
We measured the output conductance of our synapse devices during potentiation and depression (P/D) with optimized operational parameters in Figure 4 and analyzed their linearity, symmetry, and device-to-device variation. While charging/ discharging the storage node, the drain current of the read transistor was measured while applying 50 mV to V read . To verify the effects of SAM treatment, the potentiation and depression characteristics with and without SAM were compared (Figure 5a). With SAM, the maximum conductance was reduced from 2.6 to 1.3 µS because of a reduced output current of the read transistor. This conductance is much lower than typical oxide RRAM devices (≈10 −3 -10 −1 S), [33,34] and comparable to the flash-memory-based synapse devices (≈1-10 µS). [35] Considering that flash memory requires long programming/erasing time and high voltage, the IGZO 2T synapse is advantageous in terms of programming speed and power. Previous FeFET synapse devices were reported to have a much smaller number of weights (≈16-64) with comparable or higher conductance range (≈1-100 µS). [36] Previously, typical approaches to reduce the output current for synapse devices incorporated a change in materials composition or scaling-down of device dimension. However, those methods have clear limitations, such as degradation in linearity or inter-device variation, due to increased randomness. [10,37] Moreover, it may lead to a narrowed memory window due to an increased ratio of edge dead region in a scaled device. [38] We successfully achieved a low conductance range without any degradation through SAM between source/drain electrodes and IGZO. As shown in Figure 5b, C-12 SAM improved the retention characteristics by further suppressing the off current of the write transistor. We note that it was difficult to directly measure the off current as it was already lower than the measurement limit (<50 fA). With C-12 SAM treatment, our synapse device maintained over 85% of the initial conductance after 300 s. Although the retention time is relatively short compared to oxide RRAM devices (>10 4 s), [33,34,[39][40][41] it is still sufficient to maintain the weight between the training epochs. [42,43] Synapse devices must exhibit low device-to-device variation in an array for configuring a large-scale neural network. [44] Therefore, to verify the uniformity of the IGZO 2T synapse devices, the P/D data were obtained from 16 devices on a 4″ wafer, as shown in Figure 5c-e. The programming slopes were 1.95 ± 0.01 and −1.92 ± 0.05 nS per pulse each for potentiation and depression, respectively. The almost identical programming slope values indicate outstanding symmetry, and the minimal standard deviations confirm the superior deviceto-device uniformity. From the measured P/D curves, nonlinearity factors (α p , α d ) were extracted using the equations below [45] 1 exp where G p and G d are the conductance when the Nth pulse is applied during potentiation and depression, respectively. G max denotes the maximum conductance when the maximum number of pulses (N max ) is applied during the potentiation, and G min is the minimum conductance of the device. B is the fitting parameter that correlates G and A. A p and A d are the fitting parameters that represent non-linear characteristics www.advelectronicmat.de during programming, and the larger absolute values lead to more linear P/D curves as the exponent becomes smaller in Equations (3) and (4). A p and A d were converted to α p and α d according to the one-to-one table provided by the NeuroSim multilayer perceptron simulator. [45,46] All 16 devices showed remarkably low nonlinearity at 550 weight levels with only small variation: α p = −0.14 ± 0.09 and α d = −0.57 ± 0.33. The α values of both potentiation and depression are negative, which suggests convex P/D curves overall. The asymmetry factor was quantified with the nonlinearity factors as the equation below [45] Asymmetry p d From the 16 synapse devices, the asymmetry factor was extracted to be 0.43, which was much lower than the asymmetry factor of conventional synapse devices with typical values over 0.5. [36,[39][40][41] It is essential that the programmed weight is linearly multiplied by the vector input for MVM operation. If the V read is controlled within a range that does not cause a pinch-off, the output current of the read transistor increases linearly with respect to the V read . We carried out P/D measurements with varying V read from 20 to 100 mV, and it was confirmed that the output current increases in proportion to V read as shown in Figure 5f.
Our IGZO 2T synapse exhibits minimal nonlinearity compared to previously reported RRAM and PCM devices (|α| > 1). [33,39,40] While the conductance of conventional RRAM and PCM is modulated based on the change in materials' properties, our synapse device controls the conductance with the number of stored electrons in the storage node, which is advantageous for high linearity and symmetry through optimized control of charging/discharging operation. Moreover, as there is no intervention of stochastic elements, such as ion diffusion or phase change, outstanding device-to-device variation can be achieved. We summarized the characteristics of IGZO 2T synapse and previous representative synapse devices in Table 1.

Multi-Layer Perceptron Simulation Using IGZO 2T Synapse
To verify the image classification performance of the synapse device, simulations on multi-layer perceptron learning were performed using the Modified National Institute of Standard and Technology (MNIST) dataset of handwritten digits. [47] We utilized "aihwkit" as a benchmark tool in the PyTorch environment, and the parameters for the simulation, such as nonlinearity, asymmetry, device-to-device variation, and weight level quantization, were all extracted from the measured data in Figure 5. [48,49] The neural network for MNIST learning consisted of two hidden layers (784-256-128-10), and the resolution of ADC, which converts the output current into values in the neural layers, was set to 9-bit (512 levels) considering the maximum weight levels of 550 our synapse devices.  Figure 4d was used to obtain the P/D data. b) Comparison of retention characteristics of IGZO synapse with (red) and without (black) SAM treatment. Inset shows a measurement condition for the retention characteristics after each programming to the maximum conductance. The conservation ratio of G read was increased from 71% to 86% with SAM treatment after 300 s. c) Potentiation and depression data measured from 16 synapse devices. The measured curves show a good agreement with the dotted simulated data calculated stepwise in Figure 4. d) Programming slope and e) linearity and symmetry parameters extracted from the data in (c). All 16 devices showed outstanding linearity and symmetry with small variations. Potentiation slope: 1.95 ± 0.01 nS per pulse. Depression slope: −1.92 ± 0.05 nS per pulse. Non-linearity and asymmetry parameters were obtained: α p = −0.14 ± 0.09. α d = −0.57 ± 0.33. Asymmetry = 0.43. e) P/D data measured with varying V read . I read proportionally increases with V read as well as G read , which confirms the applicability of the IGZO 2T synapse in MVM operation.

www.advelectronicmat.de
As shown in Figure 6a, the error in MNIST recognition rapidly decreased to about 2% within 20 epochs. The final accuracy was 98.04% with the SAM treatment. Considering that the maximum accuracy achieved through the numerical training was 98.07%, our synapse device showed nearly ideal performance for high-precision neural network operations. We also conducted simulations with an artificially increased standard deviation value (δ) of the inter-device programming slope, assuming that process variations can be involved when our synapse device is applied to a larger array (Figure 6b). There was no significant increase in error, even when δ was increased up to 200%, which indicates an outstanding immunity to deviceto-device variation. In previous studies, the inference accuracy is known to be less sensitive to device-to-device variation when excellent linearity and symmetry of the device are ensured; the weight can converge to an appropriate value through repeated feedback during the training. [50] If the symmetry is insufficient where the weight repeats near the optimal point, it will have an offset caused by the asymmetry, resulting in a significant accuracy degradation. Since our synapse device showed excellent performance in image classification and robustness to the device variation, we expect that our device can be fabricated in massive-scale arrays and applied to neural network models with much higher complexities.

Conclusion
We implemented an artificial synapse device composed of two IGZO transistors. Our synapse device showed nearly ideal synaptic behaviors with α p = −0.14, α d = −0.57, and asymmetry = −0.43. The device-to-device variation of the programming slope was outstandingly low as 2.73% due to the large-area uniformity of IGZO. In addition, a novel SAM treatment was utilized to reduce the transistor current without degradation in reliability for enhanced synaptic behaviors. With SAM, the number of linearly programmable levels was increased, and the retention characteristic was improved by suppressing the transistor off current. With those outstanding synaptic characteristics, our device achieved an accuracy of 98.04% in MNIST simulation, almost identical to the numerically trained model. The accuracy was not significantly degraded even when the device-to-device variation was intentionally increased by twice, indicating good compatibility with the massive-scale array implementation of our device.

Experimental Section
Device Fabrication: The synapse device in this study was fabricated on a SiO 2 wafer. The substrate was cleaned by sequential sonication in acetone and isopropyl alcohol each for 5 min. For gate electrodes, an Al thin film (50 nm) was deposited by e-beam evaporation. The gate electrode was patterned by standard photolithography, followed by a wet etching process. A negative photoresist layer was patterned by photolithography for a lift-off process. Then, SiO 2 (30 nm) and IGZO (20 nm) layers were sequentially deposited by RF magnetron sputtering as a gate insulator and semiconducting layer, respectively. The lift-off was then conducted by sequential sonication in acetone and isopropyl alcohol each for 10 min. The sample was then annealed in 0.7 atm O 2 Ferroelectric tunnel junction; b) incremental pulse was applied for linearity and symmetry; c) ratio between standard deviation and average of the programming slopes. Figure 6. Simulation results of MNIST classification using IGZO 2T synapse devices. a) A test error lower than 2.2% was achieved within 20 epochs of training, which is almost identical to the numerically trained model. The test accuracy was 98.04% with the SAM treatment due to the extremely linear and symmetric weight modulation of the synapse device. b) Simulation on deliberately increased device-to-device standard deviations (δ) in the programming slopes. A negligible increase in test error was observed until the δ was intentionally increased up to 200%. Superior immunity to inter-device variation was resulted from the high linearity and symmetry of our synapse device.

www.advelectronicmat.de
for 2 h at 330 °C. A SAM treatment was applied by soaking the sample in a 3 mm ethanol solution of alkyl-phosphonic acid for 12 h and washing it in ethanol and de-ionized water to remove the residues. A negative photoresist layer was patterned for lift-off of source/drain electrodes, and Mo (10 nm) and Al (100 nm) layers were sequentially deposited by e-beam evaporation. After the source/drain lift-off, the sample was annealed at 0.7 atm N 2 for 2 h at 160 °C.
Device Characterization: All the pulse signals for the programming were generated by the Keysight 33520B function generator, and the current measurement was performed by the Keysight B1500A semiconductor analyzer.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.