Programmable low-power consumption all-optical nonlinear activation functions using a micro-ring resonator with phase-change materials

: A programmable hardware implementation of all-optical nonlinear activation functions for different scenarios and applications in all-optical neural networks is essential. We demonstrate a programmable, low-loss all-optical activation function device based on a silicon micro-ring resonator loaded with phase change materials. Four different nonlinear activation functions of Relu, ELU, Softplus and radial basis functions are implemented for incident signal light of the same wavelength. The maximum power consumption required to switch between the four different nonlinear activation functions in calculation is only 1.748 nJ. The simulation of classification of hand-written digit images also shows that they can perform well as alternative nonlinear activation functions. The device we design can serve as nonlinear units in photonic neural networks, while its nonlinear transfer function can be flexibly programmed to optimize the performance of different neuromorphic tasks.


Introduction
With the rapid development of information technology such as big data, cloud computing, smart terminals, and global data traffic is growing geometrically. Therefore, in the era of artificial intelligence(AI), traditional architecture computing systems face serious challenges in terms of energy efficiency and volume, which are limited by Moore's law [1,2]. Therefore, AI technologies represented by neural networks are rapidly developing toward achieving high speed and low power consumption [3].
Optical neurons are one of the key technologies in optical neuromorphic computing. The nonlinear activation function(NLAF), one of the perceptrons [15] of the optical neuron, is crucial to the training and decision mapping processes of the network. Compared to those in electrical neurons, all-optical NLAFs are not yet mature [16]. Optical devices can have a superlarge bandwidth and low power consumption. Therefore, photonics provides advantages in connectivity and matrix multiplication over electronics. A multitude of photonic devices exhibit nonlinear transfer functions that resemble neuron-like or gate-like transfer functions; however, a non-linear response alone is not sufficient for a photonic device to act as a neuron. Photonic neurons must be capable of reacting to multiple optical inputs (fan-in), applying a nonlinearity and producing an optical output suitable to drive other like photonic neurons (cascadability). Optical devices face fundamental challenges in satisfying these requirements in particular [3].
In recent years, all-optical nonlinear activation functions based on interference between the dipoles of the plasmon oscillation in the metal nanoparticles and the exciton transition in the Quantum dot [17], photonic crystal Fano lasers [18], the traditional non-volatile of phase change materials (PCMs) [8] and the volatile switching of PCMs excited with a free space femtosecond laser pulse [19] have been proposed. These are all ultra-fast, compact, on-chip solutions for neuromorphic photonic computing.
Usually, for different AI applications, the activation function needs to be selected according to the specific task [16]. In addition, the proper activation function affects the overall average test accuracy [20]. Experimental results show that radial basis functions (RBF) in support vector machines [21], ReLU in deep learning networks for a 50-hour English Broadcast News task [22], ELU in different vision datasets [23], and Softplus in deep learning networks for phone recognition tasks [24], significantly outperform some other activation functions. Therefore, it is essential to achieve programmability of the all-optical NLAFs.
At present, tuning the bias pulse energy injected in the Periodically poled thin-film lithium niobate (PPLN) nanophotonic waveguide [25] can be used to implement common used variants of the Relu function. The cavity paired with the tuning biases on the interferometers [26] and varying the wavelength of light input to the racetrack resonator with a span of Ge/Si hybrid waveguide [27] provide programmability among different kinds of activation functions. However, the programmable all-optical nonlinear activation function has yet to be optimized in terms of power consumption.
In this paper, we propose a programmable, low-loss all-optical activation function device based on a silicon micro-ring resonator loaded with PCMs. The NLAF relies on the nonlinear properties of the silicon micro-ring resonator, which are due to thermal and free-carrier-related nonlinearities. Programmability is achieved by loading the Ge 2 Sb 2 Te 5 (GST) PCM on the micro-ring resonator in four different intermediate states (refractive indexes) between the crystalline and amorphous states. Four different NLAFs of the Relu, ELU, Softplus and RBF are implemented for incident signal pulses at the same wavelength. The non-volatility of GST is used to maintain the four nonlinear activation functions without any extra power consumption. The maximum power consumption required to switch between the four different NLAFs is only 1.748 nJ. Finally, we simulate benchmark machine learning tasks using our all-optical NLAFs with accuracy higher than 94.8% in the task of classification of the MNIST handwritten digital image dataset, benchmark MNIST handwritten digit classification task, which demonstrates the prospect of our scheme for future applications in all-optical neural networks.

Coupled mode theory for GST-loaded silicon micro-ring
We design an add-drop micro-ring resonator with a radius of 10 µm, and a 0.5 µm long GST film is loaded as shown in Fig. 1. NLAFs are proposed for the TE mode signal light. When the signal light is injected into the micro-ring, the two-photon absorption(TPA) effect in the micro-ring generates free carriers, which will cause free carrier absorption(FCA) and free carrier dispersion(FCD). In addition, the TPA and FCA effects induce a thermo-optical(TO) effect. The FCD effect causes the resonant wavelength to blue-shift within the time scale of a few nanoseconds, whereas the time scale of the red-shift of the resonance wavelength caused by the thermo-optic effect is in the tens of nanoseconds [28]. Although the micro-ring resonator exhibits self-pulsation with both FCD and TO effects, it cannot be used as an optical NLAF device in our design when both effects are non-stationary.
We modeled the optical propagation in the micro-ring resonator using a nonlinear coupledmode theory approach based on [29][30][31][32], with the inclusion of contributions from GST and an additional straight waveguide coupled to the micro-ring resonator, i.e. GST changes the refractive index of the Si/GST hybrid waveguide and introduces extra loss, and the straight waveguide induces extra coupling loss in our updated model. The coupled ordinary differential equations are expressed in Eq. (1), where u is the temporal evolution of the intracavity field, N is the free-carrier density, and ∆T is the temperature change in the GST-loaded micro-ring resonator.
where S in is the amplitude of the signal light (signal light power P in = |S in | 2 ), ω is the frequency of the signal light, and ω r_hy is the resonance frequency of the hybrid GST/Si micro-ring cavity. The remaining parameters are listed in Table 1. γ loss_hy and γ abs_hy represent the total and absorption losses in the hybrid GST/Si micro-ring cavity, respectively.
where we have introduced the coupling loss into the waveguide γ coup (with κ = i √ γ coup ) and the radiation loss γ rad . In the hybrid GST on silicon micro-ring we have absorption by linear surface absorption, TPA and FCA: In the Silicon On Insulator (SOI) η lin = γ abs,lin /(γ rad + γ abs,lin ) ≈ 0.4 [33,34], γ rad + γ abs,lin = cα ring_hy /n g_hy , α ring_hy ,and n g_hy are the total loss and group index of the hybrid GST on silicon micro-ring cavity.
α ring_hy = (1 − ζ)α Si_ring + ζ α Si_GST n g_hy = (1 − ζ)n g_si + ζn g_PCM (4) where ζ = L PCM /2πR, α Si_GST and n g_PCM depend on the state of the GST film. Both TO and FCD effects cause a significant shift in the resonance frequency δω nl_hy , whereas the shift caused by the Kerr-effect is negligible. Using first-order perturbation theory, this gives: Equation (1) has steady-state solutions when ∂u/∂t = 0, ∂N/∂t = 0, ∂∆T/∂t = 0.The corresponding linear matrix M is obtained by adding small perturbations to the stable results and substituting the updated parameters into the normalized differential equations with omitting higher-order terms. Then, a 4 × 4 eigenmatrix M is obtained by normalization according to the method described in [26]. Thus, we can find a stable fixed point (i.e., after a small perturbation, the system relaxes back to the same point) that is suitable for NLAFs if the real parts of all four eigenvalues are negative.

Coupled mode theory for GST-loaded silicon micro-ring
We aim to ensure that the micro-ring is at a stable fixed point when GST is in different states. We analyzed the corresponding relation between the signal light power and intracavity energy in Fig. 2(a) as well as the real and the imaginary parts of the four eigenvalues of the M matrix in Fig. 2(b)-(e) when the crystallization fraction of GST is 50-80%. Figure 2(b)-(e) shows that the real parts of all four eigenvalues are negative and that the micro-ring is at a stable fixed point when the GST is in one of these four states. Only two solid and dashed lines are shown in Fig. 2(b)-(e) because two of the four eigenvalues are conjugate to each other.
Moreover, we also analyzed the relationship between signal light wavelength detuning and stability, as shown in Fig. 2(f). When the crystallization fraction of the GST loaded on the micro-ring is 50%, the relationship between the output and input powers of the micro-ring is determined with the signal light wavelength detuning of 100-250 pm, separately. The red dots correspond to the unstable fixed point, which appears only when the signal light wavelength is far from the resonant wavelength. Therefore, there is no effect on our optical NLAFs subsequently.

Implementation of programmability
While the nonlinearity results primarily from the power-dependent nonlinear phase change due to the free-carrier and TO effects, the change in the state of the loaded GST is equally important to achieve the programmability of the NLAF. From Fig. 3(a)-(d), when the wavelength of the incident signal light is 1549.38 nm, four different optical NLAFs, RBF, Relu, Softplus, and ELU, can be generated between the output power and the input power as the crystallization fraction of the loaded GST on the micro-ring increases from 50 to 80%. There is good agreement between the ideal activation function (dotted red line) and the device response (solid blue line). The switching between the different nonlinear activation functions is determined by the state of the loaded-GST. The initial state of the loaded-GST is crystalline, which is modulated to crystalline fraction of 80% by optical pulses in TM mode.
The increase in the loaded-GST crystallization fraction implies an increase in the refractive index and loss, and the resonant wavelength is red-shifted. The state of the loaded-GST is determined by the control light with a wavelength of 1546.9 nm in the TM mode. Switching among different crystallization fractions of the GST can be realized by changing the power and duration of the injected optical pulse [35]. Thus, it is possible to achieve reversible switching among different activation functions. Figure 3(e) shows the evolution of the transmission with respect to the input light power under different states of the loaded GST. It can be observed that the resonant wavelength of the micro-ring is red-shifted with an increase in the loaded-GST crystallization fraction, while the wavelength detuning is at a maximum of 50 pm. This further proves that our NLAFs work at a stable fixed point.
When the crystallization fraction of the loaded-GST is 50%, the incident signal light wavelength is longer than and relatively far from the resonant wavelength, with a detuning of -50 pm. Thus, as the signal light power increases, the output power undergoes a drop followed by a linear increase, resulting in an RBF, as shown in inset A of Fig. 3(e).
As the GST crystallization fraction increases to 60%, the red shift of the resonant wavelength causes the signal light wavelength to be closer to the resonant wavelength, even though it is still longer than the resonant wavelength, with a detuning of -23.4 pm. Thus, there is a mechanism whereby the output power remains almost zero as the input optical power increases, and then begins to rise as the input power continues to increase. This makes the Relu function, shown in inset B of Fig. 3(e), feasible.
After the loaded GST is further crystallized up to 70%, the resonant wavelength is red-shifted to a point shorter than the signal light wavelength. At this point, the resonant wavelength is very close to the signal light wavelength, and the detuning is 4.3 pm. As the input optical power increases, the output power increases very slowly resulting in an almost-zero initial output optical power, and then it increases linearly with the input power; thus, the Softplus function is generated, as shown in inset C of Fig. 3(e).
Finally, when the GST crystallization fraction is 80%, the resonant wavelength is further red-shifted towards a wavelength shorter than that of the signal light, and the detuning is -33.3 pm. The output power always increases with the input power; therefore, the ELU function can be obtained, as shown in inset D of Fig. 3(e).
In addition, to allows the NLAF to work stably for a long time, we need to ensure that the signal light does not change the state of the loaded GST. Referring to [36], the detected optical power range in the experiment is generally between -2 and -6 dBm; consequently, based on the modulation depth in Fig. 3(e) we conclude that the signal light power coupled into the Si/GST hybrid waveguide is less than -3 dBm. Thus, the power of the signal light does not change the loaded-GST state or stability of the optical NLAFs.

Simulated in benchmark MNIST handwritten digit classification
To validate the applicability of our NLAFs, we performed classification simulations on the MNIST dataset using (a), (b), (c) and (d) in Fig. 3, which are abbreviated as ONAF-RBF, ONAF-Relu, ONAF-Softplus and ONAF-ELU. In particular, we use a rational function to fit our discrete data in Fig. 3 and ensure that the fitting curve passes through the original points; only the positive part of the function is considered.
We simulated a three-layer fully connected neural network and studied its accuracy in a benchmark MNIST handwritten digit classification task, as illustrated in Fig. 4(a). Each input image in the MNIST dataset is of 28 × 28 pixels. To implement the classification of the MNIST dataset, raw images with a size of 28 × 28 pixels were first flattened into one-dimensional arrays. Then, 784 input pixels are fed into the three-layer network, and the output elements are normalized to represent probabilities from digit 0 to 9. Networks with ONAF-RBF, ONAF-Relu, ONAF-Softplus and ONAF-ELU show a good performance in benchmark MNIST handwritten digit classification task, with accuracy of 96%,96.4%,95.3% and 94.8%, respectively. The cross-entropy loss of the training dataset during the training process is shown in Fig. 4(b), and the test accuracy in Fig. 4(c). Meanwhile, we adopt Tanh activation function to verify the programmability and the efficiency. Besides this classification example, there are other applications where such activation functions are routinely used for artificial neural network tasks.

Control light
GST has a high refractive index contrast between its amorphous and crystalline states. We can induce a slight change in the resonant wavelength of the hybrid GST on silicon micro-ring using the intermediate crystallographic states of the GST, that is, states with a mixture of crystalline and amorphous regions.
Therefore, we used a control light to manipulate the state of the GST. The reason for chose the TM mode for the control light because the electric field distribution is more concentrated on the upper and lower sides of the waveguide than in the TE mode, as shown in Fig. 5(a) and (b); thereby the GST overlaps with the optical field over a larger area and absorbs more optical power. This enables a lower power consumption for switching between different NLAFs. However, when the state of the GST changes, the resonant wavelength of the micro-ring and the optical power coupled into the cavity also change as well. Accordingly, we simulated the transmission spectrum of the micro-ring when the GST crystallization fraction is from 50% to 80%, as shown in Fig. 5(c). When the control light wavelength is 1546.9 nm, the difference of the power coupled into the micro-ring does not exceed 2%, as shown in the inset of Fig. 5(c). As a result, for the control light at 1546.9 nm, the energy coupled into the cavity is approximately 25% at different crystallization fractions of GST.

Power consumption
We analyzed the power consumption for switching among the four optical NLAFs at a control light wavelength of 1546.9 nm. Since the crystalline GST has a larger thermal conductivity and the crystallization process takes a longer time [37], we conclude that the highest power consumption for switching between the four optical NLAFs is corresponds to the switching from RBF to ELU (degree of crystallization from 50 to 80%), whereas the lowest power consumption is switching from ELU to Softplus (degree of crystallization from 80 to 70%).
We then performed a similar analysis referring to [38] for the switching process between the two states. The power consumption required for state switching is determined by both the incident optical pulse duration and pulse power. Due to the TO effect the relaxation time is in the nanosecond regime [39] and due to the limitation in the ablation temperature of the GST, we separately chose P 1 = 107 mW, t 1 = 1 ns and P 1 = 107 mW, t 1 = 1 ns, P 2 = 10 mW, t 2 = 29 ns optical pulses to realize the crystallization and amorphization processes, respectively, as shown in Fig. (6). The maximum and minimum power consumption required to switch between the optical NLAFs are 1.7488 nJ and 0.428 nJ, respectively. A major advantage of our device is its non-volatility: no additional power supply is required to maintain the state.

Conclusions
We designed a programmable, low-loss all-optical activation function device based on a silicon micro-ring resonator loaded with PCMs. The NLAF relied on the nonlinear properties of the silicon micro-ring resonator. Programmability was achieved by configuring the state of the GST loaded on the micro-ring resonator. Four different nonlinear activation functions, Relu, ELU, Softplus, and RBF, were implemented for the same incident signal light. The maximum power consumption required to switch between the four different NLAFs was only 1.748 nJ. Simulation of the classification of handwritten digit images also showed that they performed well as alternative NLAFs. Because of the non-volatility of GST, each implementation of the network after determining the NLAF does not need to be reconfigured and consumes almost no energy, thereby achieving a genuinely low-power programmable all-optical NLAF. This demonstrates the potential of the proposed scheme for future applications in all-optical neural networks.

Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request