Frequency-Domain Models for Nonlinear Microwave Devices Based on Large-Signal Measurements

In this paper, we introduce nonlinear large-signal scattering ( S) parameters, a new type of frequency-domain mapping that relates incident and reflected signals. We present a general form of nonlinear large-signal S-parameters and show that they reduce to classic S-parameters in the absence of nonlinearities. Nonlinear large-signal impedance ( Z) and admittance ( D) parameters are also introduced, and equations relating the different representations are derived. We illustrate how nonlinear large-signal S-parameters can be used as a tool in the design process of a nonlinear circuit, specifically a single-diode 1 GHz frequency-doubler. For the case where a nonlinear model is not readily available, we developed a method of extracting nonlinear large-signal S-parameters obtained with artificial neural network models trained with multiple measurements made by a nonlinear vector network analyzer equipped with two sources. Finally, nonlinear large-signal S-parameters are compared to another form of nonlinear mapping, known as nonlinear scattering functions. The nonlinear large-signal S-parameters are shown to be more general.

In this paper, we introduce nonlinear largesignal scattering (S) parameters, a new type of frequency-domain mapping that relates incident and reflected signals. We present a general form of nonlinear largesignal S-parameters and show that they reduce to classic S-parameters in the absence of nonlinearities. Nonlinear largesignal impedance (Z) and admittance (Y) parameters are also introduced, and equations relating the different representations are derived. We illustrate how nonlinear large-signal S-parameters can be used as a tool in the design process of a nonlinear circuit, specifically a single-diode 1 GHz frequency-doubler. For the case where a nonlinear model is not readily available, we developed a method of extracting nonlinear large-signal S-parameters obtained with artificial neural network models trained with multiple measurements made by a nonlinear vector network analyzer equipped with two sources. Finally, nonlinear large-signal S-parameters are compared to another form of nonlinear mapping, known as nonlinear scattering func-measurement of S-parameters by VNAs is invaluable to the microwave designer for modeling and measuring linear circuits, these measurements are oftentimes inadequate for nonlinear circuits operating at large-signal conditions, since nonlinearities transfer energy from the stimulus frequency to products at new frequencies.
Thus, conventional linear network analysis, which relies on the assumption of superposition, must be replaced by a more general type of analysis, which we refer to as nonlinear network analysis.
Nonlinear network analysis involves characterizing a nonlinear device under realistic, large-signal operating conditions. To do this, complex traveling waves (rather than ratios) are measured at the ports of a DUT not only at the stimulus frequency (or frequencies), but also at other frequencies where energy may be created. Assuming the input signals are sine-waves and the DUT exhibits neither sub-harmonic nor chaotic behavior, the input and output signals will be combinations of sine-wave signals, caused by the nonlinearity of the DUT in conjunction with impedance mismatches between the measuring system and the DUT. If a single excitation frequency is present, new frequency components will appear at harmonics of the excitation frequency, and if multiple excitation frequencies are present, new frequency components will appear at the intermodulation products as well as at harmonics of each of the excitation frequencies. In practice, there will be a limited number of significant harmonics and intermodulation products. The set of frequencies at which energy is present and must be measured is known as the frequency grid.
A class of instruments known as nonlinear vector network analyzers (NVNA) are capable of providing accurate waveform vectors by acquiring and correcting the magnitude and phase relationships between the fundamental and harmonic components in the periodic signals [1][2][3][4][5]. An NVNA excites a nonlinear DUT with one or more sine wave signals and detects the response of the DUT at its signal ports. Assuming the DUT does not exhibit any sub-harmonic or chaotic behavior, the input and output signals will be combinations of sine wave signals due to the nonlinearity of the DUT in conjunction with mismatches between the system and the DUT. With these facts in mind, the major difference between a linear VNA and an NVNA is that a VNA measures ratios between input and output waves one frequency at a time while an NVNA measures the actual input and output waves simultaneously over a broad band of frequencies.
Even though S-parameters cannot adequately represent nonlinear circuits, some type of parameters relat-ing incident and reflected signals are beneficial so that the designers can "see" application-specific engineering figures of merit that are similar to what they are accustomed to. In first part of this paper, we propose definitions of such ratios that we refer to as nonlinear large-signal scattering (S) parameters. We also introduce nonlinear large-signal impedance (Z) and admittance (Y) parameters, and present equations relating the different representations. Next, we make two simplifications when considering the cases of a one-port network with a single-tone excitation and a two-port network with a single-tone excitation.
For existing nonlinear models, we can readily generate nonlinear large-signal S-parameters by performing a harmonic balance simulation. For devices, with no model available, we can extract these parameters from artificial neural network (ANN) models that are trained with multiple frequency-domain measurements made on a nonlinear DUT with an NVNA. To illustrate applications and generation of nonlinear large-signal Sparameters, we present two examples. First, we illustrate how nonlinear large-signal S-parameters can be used as a tool in the process of designing a simple nonlinear circuit, specifically a single-diode 1 GHz frequency-doubler circuit. And secondly, we describe a method for generating nonlinear large-signal S-parameters based upon ANN models trained on frequencydomain data measured using an NVNA. We compare a diode circuit model, generated using this method, to a harmonic balance simulation of a commercial device model.
Finally, we compare our nonlinear large-signal Sparameters to another form of nonlinear mapping, known as nonlinear scattering functions [6][7]. Specifically, we show that the two formulations are not equivalent. Nonlinear large-signal S-parameters are more general than the nonlinear scattering functions, which are useful in approximating a specific class of nonlinearity in a more compact form. put signals since energy can be transferred to other frequencies in a nonlinear device.
After presenting the general form of nonlinear largesignal S-parameters, we also introduce nonlinear large-signal impedance (Z) and admittance (Y) parameters, and present equations for relating the different representations. Next, we make two simplifications in which we consider the cases of a one-port network with a single-tone excitation and a two-port network with a single-tone excitation.

General Form
Consider an N-port network. Normalized wave variables a jl and b jl at the jth port and lth harmonic are proportional to the incoming and outgoing waves, respectively, and may be defined in terms of the voltages associated with these waves as follows: (1) where V + jl and Vjl represent voltages associated with the incoming and outgoing waves in the transmission lines connected to the jth port and containing frequencies of the lth harmonic; Z oj represents the characteristic impedance of the line at the jth port.
The nonlinear large-signal scattering matrix S of the network expresses the relationship between a's and b's at various ports and harmonics through the matrix equation (2) where b and a are (N × M)-element column vectors.
Here N refers to the number of ports and M refers to the number of harmonics being considered. Matrix S is an (N × M) 2 -element square matrix. We assume all a's and b's are phase referenced to a 11 to enforce time invariance [8].
As an example, consider a two-port network with 3 harmonics; Eq. (2) then becomes (3) where (4) For each nonlinear large-signal scattering parameter S ijkl the index i refers to the port number of the b wave, the index j refers to the port number of the a wave, k is the harmonic index of the b wave, and l is the harmonic index of the a wave. The vectors are (M=3)-element vectors given by (5) Equation (3) can be expanded as follows (6) Note that in each of the four sub-matrices, the diagonal elements contain the same-frequency scattering parameters, the upper right elements contain the frequency down-conversion scattering parameters, and the lower left elements contain the frequency up-conversion scattering parameters. If the device under consideration contains no nonlinearities (i.e., no power is transferred to other frequencies), then Eq. (6) reduces to (7) which is the matrix representation for the well-known linear S-parameters involving three excitation frequencies.

Nonlinear Large-Signal Impedance Parameters
Rather than expressing the relationship between a's and b's in terms of a nonlinear large-signal scattering matrix S, we can alternatively express the relationship ; . 11  1111  1112  1113  1211  1212  1213   12  1121  1122  1123  1221  1222  1223   13  1131  1132  1133  1231  1232  1233   21  2111  2112  2113  2211  2212  2213   22  2121  2122  2123  2221  2222  2223   23  2131 between voltages (V's) and currents (I's) in terms of a nonlinear large-signal impedance matrix Z Z, as follows where V and I are (N×M)-element column vectors. Once again N refers to the number of ports and M refers to the number of harmonics being considered. Z Z is an (N×M) 2 -element square matrix. For a two-port network with 3 harmonics, Eq. (8) becomes (9) where (10) For each nonlinear large-signal impedance parameter Z ijkl , the index i refers to the port number of the voltage V, the index j refers to the port number of the current I, k is the harmonic index of V, and l is the harmonic index of I. The vectors are (M=3)-element vectors given by (11) Equation (9) can be expanded to (12)

Relating S and Z Z Matrices
The S and Z Z matrices can be expressed in terms of one another, if we know how a and b relate to V and I. From Eq. (1), we can express V ik in terms of a jl and b ik as follows: (13) where the subscripts refer to the ith port and the kth harmonic. We can similarly express I jl as (14) where the subscripts refer to the jth port and at the lth harmonic.
For simplicity, we will assume for now that the network under consideration consists of two ports. Later, we can easily generalize the equations relating the S and Z Z matrices for any N-port network. If we allow the two transmission lines or waveguides connecting the two ports to have different characteristic impedances, Z o1 and Z o2 , Eq. (14) can be expressed in matrix form as (15) where [U] is the identity matrix. Equation (9)    .

Nonlinear Large-Signal Admittance Parameters
We can also express the relationship between voltages (V's) and currents (I's) in terms of a nonlinear large-signal admittance matrix Y, as follows where Y is an (N×M) 2 -element square matrix. For a two-port network with three harmonics, for example, Eq. (26) becomes For each nonlinear large-signal admittance parameter Y ijkl , the index i refers to the port number of the current I, the index j refers to the port number of the voltage V,       .
k is the harmonic index of I, and l is the harmonic index of V. The vectors are, once again, (M=3)-element vectors, defined in Eq. (11). Equation (27) can be expanded as follows (29)

Relating S and Y Matrices
The S and Y matrices can also be expressed in terms of one another, using Eqs. (13) and (14) which show how a and b relate to V and I.
For simplicity, we will again assume the network under consideration consists of two ports. If we allow the two transmission lines or waveguides connecting the two ports to have different characteristic impedances Z o1 and Z o2 , Eq. (14)   .   .

One-Port Network With Single-Tone Excitation
For a one-port network with a single-tone excitation at the fundamental frequency, we can extract a reflection coefficient given by The limitation imposed on the equation is that all other incident waves other than a 11 equal zero. Instead of simply taking the ratio of b 1k to a 11 , we reference the phase of b 1k to that of a 11 . To do this, we must subtract k times the phase of a 11 from b 1k [8].
For a one-port network with a single-tone excitation at the fundamental frequency, we can show that the equation relating S and Z Z reduces to the same wellknown equation for the linear case if we assume that no energy is redistributed into the form of frequency down-conversion. To illustrate this, we will consider only M=3 harmonics, for the sake of simplicity. Equation (6)   .  ,

Two-Port Network With Single-Tone Excitation
For a two-port network excited at port 1 by a singletone excitation at the fundamental frequency, we can extract an input reflection coefficient given by (52) As with Eq. (41), instead of simply taking the ratio of b 1k to a 11 , we phase reference to a 11 . To do this we must subtract k times the phase of a 11 from b 1k . The limitation once again imposed on the equation is that all other incident waves other than a 11 equal zero.
Another valuable parameter, the forward transmission coefficient, is similarly extracted as follows (53) This parameter provides a value of the gain or loss through a device either at the fundamental frequency or converted to a higher harmonic frequency.
In addition to the previous two parameters, given in Eqs. (52) and (53), an output reflection coefficient can also be useful when trying to determine the output matching network. If a nonlinear DUT is operating under its normal drive condition (a 11 at some constant signal level), and a second source, excited by a smallsignal tone at frequency f k , is placed at port 2 of the DUT, one of the equations in the matrix defined by Eq. (6) reduces to (54) If we solve Eq. (54) for S 22kk , we obtain (55) In Eq. (55), the output reflection coefficient S 22kk obviously cannot be determined by simply taking the ratio of b 2k to a 2k , since the ratio also depends on a 11 through S 21k1 . When a 2k is small, we can generate another signal ∆a 2k that is offset slightly from the frequency of interest f k by ∆f k . Eq. (54) then becomes (56) where ∆a 2k << a 2k and S 22kk remains constant over this frequency range. Subtracting Eq. (54) from Eq. (56) gives (57) which does not depend on S 21k1 . If we solve Eq. (57) for S 22kk , we obtain (58) Equation (58) is a quasi-linear approximation of the output reflection coefficient under normal operating conditions, and is consistent with the definition of "Hot S 22 ," which has been used to measure the degree of mismatch at the output port of a power amplifier at its excitation frequency.

Summary of Sec. 2
In this section, we presented the general form of nonlinear large-signal S-parameters. Unlike linear Sparameters, nonlinear large-signal S-parameters depend upon the signal magnitude and must take into account the harmonic content of the input and output signals, since energy can be transferred to other frequencies in a nonlinear device. We also introduced nonlinear large-signal impedance (Z) and admittance (Y) parameters, and presented equations for relating the different representations. Next, we made two simplifications, considering the cases of a one-port network with a single-tone excitation and a two-port network  .
with a single-tone excitation. For the one-port case with a single-tone excitation at the fundamental frequency, we showed that the equation relating S and Z reduces to the same well-known equation for the linear case if we assume that no energy is transferred to frequency down-conversion. For the two-port case excited at port 1 by a single-tone excitation at the fundamental frequency, we extracted an input reflection coefficient S 11k1 , a forward transmission coefficient S 21k1 , and a quasi-linear output reflection coefficient S 22kk .

Using Nonlinear Large-Signal S-Parameters to Design a Diode Frequency-Doubler Circuit With a Harmonic-Balance Simulator
Resistive frequency doublers operate on the principle that a sinusoidal waveform is distorted by the nonlinear I/V characteristic of a Schottky-barrier diode [9]. This distortion causes power to be generated at higherharmonic frequencies. The design of such doublers involves separating the input and output signals by filters and determining the optimum input and output matching circuits, as illustrated in Fig. 1.
Although single-diode resistive doublers are not very efficient (analysis predicts a conversion loss of at least 9 dB [10]), we chose this circuit because it is simple enough to clearly illustrate how nonlinear large-signal S-parameters can be used as a design tool.
In the following sections, we describe the various steps involved in designing a single-diode 1 GHz frequency-doubler circuit. Since we are using a simulator, we can force the stimulus to consist of only |a 11 |, with all other a mn terms equal to zero, where m and n are positive integers such that m ≠ 1 and n ≠ 1. (In practice, this condition can never be completely realized in a measurement environment.) With only an a 11 component present, we need only consider the parameters S 11k1 (Eq. 52), which is a measure of the large-signal input match at the kth harmonic, as well as the param-eter S 21k1 (Eq. 53), a measure of the large-signal conversion loss or gain at the kth harmonic, plus the quasilinear S 2222 (Eq. 58) to determine the output matching network at the second harmonic. Figure 2 illustrates the setups required for determining these parameters. Determining S 2222 requires a second source at port 2 at a frequency slightly offset from ω 2 .
In the first step, we perform a simulation on the diode alone and use S 2121 to determine the optimum bias condition for converting power from the fundamental frequency to the second harmonic. Second, we add filtering networks to separate the input and output signals, and verify their proper performance by looking at S 2111 and S 1121 . Third, we make use of S 1111 to determine the input matching network. Fourth, with the input matching network in place, we place a second source at port 2 and find the quasi-linear value of S 2222 , which allows us to determine the output matching network. Fifth, we use the optimization feature of the simulator to minimize S 1111 by varying the line lengths of the input and output matching circuits. And finally, sixth, we add 4 GHz and 6 GHz filters at the output (and re-determine the proper input and output matching circuits) in order to reduce the values of S 2141 and S 2161 , which in turn increases the value of S 2121 and cleans up the output waveform.

Diode Only
In this example, we use a compact model to simulate a commercial Schottky-barrier diode. The model includes a series resistance R s of 14 Ω, a junction  Nonlinear large-signal S-parameters used to characterize a two-port device excited by a single-tone signal at port 1.
capacitance at zero voltage C j0 of 0.08 pF, and a reverse saturation current I s of 3 × 10 -10 A.
First, we perform a harmonic-balance simulation on the diode, sweeping the bias voltage to determine which condition gives the highest value of S 2121 for a 11 = 1.0 V. Note that in all simulations we set the generator impedance Z G and the load impedance Z L to 50 Ω. After sweeping the voltage, we determine that the optimum forward bias is +0.48 V.

Diode With 1 GHz and 2 GHz Filters
With a stimulus of a 11 = 1.0 V and a forward bias of +0.48 V, we add filtering networks to separate the input and output signals. On the input side, we place a 2 GHz, λ/4 (λ/8 at 1 GHz) open-circuited stub. This creates an RF short at 2 GHz, preventing the output power generated in the diode from traveling backward. On the output side, we place a 1 GHz, λ/4 open-circuited stub. This creates an RF short at 1 GHz, preventing any signal at 1 GHz from traveling forward. Table 1 lists the simulated values for S 1111 -S 1161 , S 2111 -S 2161 , G 2 and G 2 /G for each of the design stages, where G is the expanded power gain and G 2 is the expanded power gain confined to the second harmonic, as defined in [11]. With the 1 GHz and 2 GHz filters in place, we see that the value of |S 1121 | decreases from 0.170 to 1.3 × 10 -5 , the value of |S 2111 | decreases from 0.536 to 3.3 × 10 -5 , and G 2 increases from -14.16 dB to -9.73 dB.

Diode With 1 GHz and 2 GHz Filters and Input Matching
Once the filters are placed in the circuit, we make use of the complex-valued S 1111 to design the input matching network with the well-known single, open-circuited stub technique. This is possible, assuming that no energy is transferred to frequency down-conversion, as discussed in Sec. 2.6. We see in Table 1 that |S 1111 | reduces from 0.569 without the input matching network to 9.4 × 10 -2 with the input matching network in place. Likewise, G 2 increases from -9.73 dB to -9.69 dB.

Diode With 1 GHz and 2 GHz Filters, Plus Input and Output Matching
Whereas our input matching network is designed for 1 GHz, our output matching network must be designed for 2 GHz. While the circuit is operating under its normal drive condition (a 11 = 1.0 V and a forward bias of +0.48 V) we place a second source at port 2, excited by a small-signal tone (∆a 22 = 0.01 V) at a frequency offset of 10 kHz from the desired 2 GHz, to give us the quasi-linear value of S 2222 , which allows us to determine the output matching network. We make use of S 2222 to design the output matching network with the well-known single, open-circuited stub technique. We see in Table 1 that with the output matching network in place, the value of |S 2121 | is only marginally increased from 0.326 to 0.328. This is because the value of S 2222 is relatively low, which means the output is already almost matched to 50 Ω. We also note that G 2 increases from -9.69 dB to -9.65 dB.

Diode With 1 GHz and 2 GHz Filters, Plus Optimized Input and Output Matching
With the filters and matching networks in place, we use the optimization feature of the simulator to minimize S 1111 by varying the lengths of the lines in the input and output matching circuits. Doing this decreases the value of |S 1111 | from 8.7 × 10 -2 to 6.0 × 10 -3 while increasing the value of |S 2121 | from 0.328 to 0.331 and G 2 from -9.65 dB to -9.60 dB.
In order to clean up the output waveform, we add 4 GHz and 6 GHz filters, in the form of λ/4 open-circuited stubs, at the output. With these filters placed in the circuit, we re-determine the proper input and output matching conditions. After optimizing the circuit once again, the value of |S 2141 | decreases from 4.0 × 10 -2 to 1.4 × 10 -6 and the value of |S 2161 | decreases from 2.9 × 10 -2 to 2.7 × 10 -6 . The addition of these filters, in turn, slightly increases |S 2121 | from 0.331 to 0.332 and G 2 from -9.60 dB to -9.56 dB. At this final design stage, the overall power gain is nearly -9.56 dB since the ratio G 2 /G = 0.999. The semi-empirical analysis of [10] predicts a maximum gain of -9 dB. Figure 3 illustrates the final design of the single-diode resistive doubler circuit. And Fig. 4 shows the time-domain plots of a 1 and b 2 for the final design of the simulated 1 GHz frequency-doubler circuit.

Summary of Sec. 3
We illustrated how nonlinear large-signal S-parameters can be used as a tool in the design process of a single-diode 1 GHz frequency-doubler. Specifically, we used S 1111 to determine the input matching network, S 2222 to determine the output matching network, and S 11k1 , S 21k1 (for k = 1 to 6), and G 2 to quantify the performance of the circuit at each stage.
By the final stage of the design, we had created a doubler with an overall power gain of -9.56 dB, not far from the maximum possible predicted value of -9 dB.

S-Parameters from Artificial Neural Network Models Trained With Measurement Data
Although nonlinear large-signal S-parameters can be easily determined for an existing model in a commercial harmonic balance simulator by forcing all a's other than a 11 to zero, they cannot be determined directly from measurements. With currently available NVNAs, the nonlinear DUT, in conjunction with the impedance mismatches and harmonics from the system Volume 109, Number 4, July-August 2004 make it impossible to set all a's other than a 11 (assuming port 1 excitation) to zero. In order to overcome this obstacle, we propose a method [12] that makes use of multiple measurements of a DUT using a second source with isolators, as shown in Fig. 5. This measurement set-up is similar to that introduced by Verspecht et al. [6][7] to generate "nonlinear scattering functions." As a side note, we compare and contrast the "nonlinear scattering functions" with our definitions of nonlinear large-signal scattering parameters in the Appendix.

Methodology
To illustrate our technique of generating nonlinear large-signal S-parameters, let us consider the case where a DUT is excited at port 1 by a single-tone signal at frequency f 1 and signal level |a 11 |. Utilizing a second source, we take multiple measurements of a nonlinear circuit for different values of a mn [(m≠1)∧(n≠1)]. We then use these data to develop an artificial neural network (ANN) model that maps values of a's to b's, as shown in Fig. 6. Once the ANN model is trained and verified, the nonlinear large-signal S-parameters are obtained by interpolating b's from the measured results for nonzero values of a mn [(m≠1)∧(n≠1)] to the desired values for a mn [(m≠1)∧(n≠1)] equal to zero, as shown in Fig. 7. Alternatively, other conditions may be called for, where a mn ≠0 depending on the desired application-specific figure of merit.
One popular type of ANN architecture, which is used in our work, is a feed-forward, three-layer perceptron structure (MLP3) consisting of an input layer, a hidden layer, and an output layer [13]. The hidden layer allows for complex models of input-output relationships. ANNs learn relationships among sets of input-output data that are characteristic of the device or system under consideration. After the input vectors are presented to the input neurons and output vectors are computed, the ANN outputs are compared to the desired outputs and errors are calculated. Error derivatives are then calculated and summed for each weight until all of the training sets have been presented to the network. The error derivatives are used to update the weights for the neurons, and training continues until the errors become no greater than prescribed values. In our study, we have utilized software developed by Zhang et al. [14] to construct our ANN models.
To test our method of generating nonlinear large-signal S-parameters, we fabricated a wafer-level test circuit using a Schottky diode in a series configuration, as   shown in Fig. 8. The two-port diode circuit was fabricated on an alumina substrate by bonding a beam-lead diode package to the gold metalization layer with silver epoxy. The diode was located in the middle of the coplanar waveguide (CPW) transmission lines, with short lines connecting the diode to probe pads at both ports. We measured the test circuit on an NVNA using an on-wafer VNA line-reflect-reflect-match (LRRM) calibration, along with signal amplitude and phase calibrations. This process places the reference plane at the tips of the wafer probes used to connect with the CPW leads.
For all measurements, the first source, located at port 1, used a sine-wave excitation of frequency 900 MHz and magnitude |a 11 |≈0.178 V (-5 dBm in a 50 Ω environment) at the probe tips. The second source was connected to port 2 and used a sine-wave excitation of frequency 900 MHz and |a 21 |≈0.178 V. The diode was forward-biased to +0.2 V through the probe tips. In order to obtain the nonlinear large-signal S-parameters, S 11k1 and S 21k1 , the excitation from source 1 was held constant, while the phase of source 2 was randomly changed for 500 different measurements that varied slightly in magnitude. Figure 9 plots the resulting measurements of a 21 in the complex plane. The nonlinearities in the test circuit, along with impedance mismatches, created other input components at higher harmonics, as shown in Figs. 10-13 for the second and third harmonics (a 12 , a 13 , a 12 , and a 13 ). These variations in a ij allowed us to create an ANN model that could be used to interpolate b's from the measured results for nonzero values of a mn [(m≠1)∧(n≠1)], as shown in Figs. 14 and 15 for b 11 and b 21 , to the desired values for a mn [(m≠1)∧(n≠1)] equal to zero, or alternatively another desired device condition.

Sensitivity Analysis of ANN Models
Data from the 500 measurements were used to develop two ANN models, one for mapping values from the first five harmonics of a 1 and a 2 (a 11 , a 12 , …, a 15 , a 21 , a 22   First, we varied the number of hidden neurons from 1 to 20. All other parameters were held constant. Specifically, the 500 measurements points were divided into 250 training points and 250 testing points, and we used the conjugate gradient method for training. Table 2 lists the average testing errors and correlation coefficients for the models that map a 1 and a 2 to b 1 , and Table 5 lists the average testing errors and correlation coefficients for the models that map a 1 and a 2 to b 2 . Both mappings show similar trends. The average testing errors decreased with increasing numbers of hidden neurons until around 14 or 16, where the errors were minimized. For more than 16 hidden neurons, the trend reversed and the errors appeared to start increasing again. Figure 16 plots the average testing errors as a       function of the number of hidden neurons for both mappings. Next, we varied the number of training points from 5 to 250. All other parameters were held constant. The number of hidden neurons was set to 14 since we found that to be an ideal number from the previous analysis, and 250 testing points were used for verification. Table  3 lists the average testing errors and correlation coefficients for the models that map a 1 and a 2 to b 1 , and Table  6 lists the average testing errors and correlation coefficients for the models that map a 1 and a 2 to b 2 . Once again, both mappings showed similar trends. The average testing errors decreased for an increasing number of training points. However, as more and more training points were added, diminishing returns on the testing errors were evident. Figure 17 plots the average testing errors as a function of the number of training points for both mappings.
Finally, we varied the number of testing points from 5 to 250. All other parameters were held constant. The number of hidden neurons was once again set to 14, and the same 250 training points were used for model development. Table 4 lists the average testing errors and correlation coefficients for the models that map a 1 and a 2 to b 1 , and Table 7 lists the average testing errors and correlation coefficients for the models that map a 1 and a 2 to b 2 . Both mappings showed that the average testing errors varied little with the number of testing points. Figure 18 plots the average testing errors as a function of the number of testing points for both mappings.

Results and Comparison for Sec. 4
Based on the results of our sensitivity analysis, we decided to use 250 training points and 250 testing points to train and verify the two ANN models. We chose to use 14 hidden neurons for mapping values from the first five harmonics of a 1     After the ANN models were developed, the nonlinear large-signal S-parameters, S 11k1 and S 21k1 (k = 1, 2, …, 5), were obtained by interpolating b 1k and b 2k from measured results for nonzero values of a 12 , a 13 , …, a 15 and a 21 , a 22 , …, a 25 to the desired values for a 12 , a 13 , …, a 15 and a 21 , a 22 , …, a 25 equal to zero. Figure 19 shows the interpolated value of b 11 (= S 1111 · a 11 ) when a 12 , a 13 , …, a 15 and a 21 , a 22 , …, a 25 were set equal to zero, and Fig. 20 shows the interpolated value of b 21 (= S 2111 · a 11 ) when a 12 , a 13 , …, a 15 and a 21 , a 22 , …, a 25 were set equal to zero.
We compared our results to a compact model provided by the manufacturer and simulated in commercial harmonic-balance software to get an independent check on our methodology. Our comparison was accomplished by providing the simulator with the identical biasing conditions on the diode and a stimulus of the same magnitude used in the measurements for a 11 and setting all other a's to zero. Providing the simulated circuit with a 11 of the same magnitude as the measurement should give the same values of b 1k and b 2k as the interpolated values of b 1k (= S 11k1 · a 11 ) and b 2k (= S 21k1 · a 11 ) determined by the ANN models when a 12 , a 13 , …, a 15 and a 21 , a 22 , …, a 25 are set equal to zero. Figures 19 and  20 show that the simulated values b 11 and b 21 agree with those determined from the measurement-based ANN models.
Quantitatively, the differences between the ANN and equivalent-circuit models are shown in Table 8.

Summary of Sec. 4
We described a method of extracting nonlinear largesignal S-parameters, using an NVNA equipped with isolators and a second source. First, we showed how multiple measurements of a nonlinear circuit could be used to train artificial neural networks. Then, we extracted the desired S-parameters by interpolating the ANN models for all a's equal to zero other than a 11 . We checked our approach by comparing our results to a compact model simulated in commercial harmonic-balance software, and showed that the two methods agree well.
We also performed a sensitivity analysis on the ANN networks, and discovered the following: (1) The average testing error decreases for an increasing number of training points. However, as more and more training points are added, diminishing returns on the testing errors are evident. (2) As the number of hidden neurons are increased, the average testing error decreases until around 14 hidden neurons at which point more hidden neurons have no benefit and can actually lead to increases in testing error. (3) The number of testing points does not drastically affect the testing error. In fact, no more than 25 testing points are needed for the models tested.

Overall Summary
In this paper, we introduced nonlinear large-signal scattering parameters representing a new type of frequency-domain mapping that relates incident and  Values of S 2111 · a 11 were determined from the measurement-based ANN model (square) and the harmonic balance simulation using a compact model (triangle). reflected signals. Unlike classical S-parameters, nonlinear large-signal S-parameters take harmonic content into account and depend on the signal magnitudes. First, we presented a general form of nonlinear largesignal S-parameters and showed that they reduce to classic S-parameters in the absence of nonlinearities. We also introduced nonlinear large-signal impedance (Z) and admittance (Y) parameters, and presented equations that relate the different representations. Next, we considered two simplified cases of a one-port network and a two-port network, each with a single-tone excitation. For the one-port network, we showed that the equation relating S and Z reduces to the same wellknown equation for the linear case, assuming no power is transferred in the form of frequency down-conversion. For the two-port case, we extracted input reflection coefficients and forward transmission coefficients, which can be useful for designing circuits such as amplifiers and frequency multipliers. In addition, we derived a quasi-linear approximation of the output reflection coefficient under normal operating conditions. These three two-port parameters allow a designer to "see" application-specific engineering figures of merit that are similar to what he or she is accustomed to in the linear world.
Next, we illustrated how nonlinear large-signal Sparameters can be used as a tool in the design process of a single-diode 1 GHz frequency-doubler. Specifically, we used S 1111 to determine the input matching network, S 2222 to determine the output matching network, and S 11k1 , S 21k1 (for k = 1 to 6), and G 2 to quantify the performance of the circuit at each stage. By the final stage of the design, we had created a doubler with an overall power gain of -9.56 dB, a value not far from the maximum possible predicted value of -9 dB.
For the case where a nonlinear model is not readily available, we described a method of extracting nonlinear large-signal S-parameters, using an NVNA equipped with isolators and a second source. First, we showed how multiple measurements of a nonlinear cir-cuit could be used to train artificial neural networks. Then, we extracted the desired S-parameters by interpolating the ANN models for all a's equal to zero other than a 11 . We checked our approach by comparing our results to a compact model simulated in commercial harmonic-balance software, and showed that the two methods agree well. We also performed a sensitivity analysis on the ANN networks, and discovered the following: (1) The average testing error decreases for an increasing number of training points. However, as more and more training points are added, diminishing returns on the testing errors are evident. (2) As the number of hidden neurons are increased, the average testing error decreases until around 14 hidden neurons, at which point more hidden neurons have no benefit and can actually lead to increases in testing error. (3) The number of testing points does not drastically affect the testing error. In fact, no more than 25 testing points are needed for the models tested.

Appendix A. Comparing Nonlinear Large-Signal S-Parameters With Nonlinear Scattering Functions
Here, we compare the nonlinear large-signal Sparameters, introduced in this paper, to another form of nonlinear mapping, known as nonlinear scattering functions, introduced by Verspecht [6][7].
For a two-port nonlinear device, excited by a singletone signal, and assuming all harmonic signals are relatively small compared to the fundamental signals, Verspecht defines nonlinear scattering functions as   11 ), Re(a 21 ), and Im(a 21 ). The imaginary component of a 11 is omitted, with the assumption that the wave variables are phase referenced such that the phase of a 11 is set to zero. F kp , G kpij , and H kpij are assumed complex constants for a given bias and fundamental drive condition. Note that these three terms do not depend upon the higher harmonic signal levels. With the a ij wave variables split into real and imaginary components, G kpij and H kpij serve to map a ij circles centered at zero to b kp ellipses with variable axes also centered at zero, as shown in Fig. 21. The F kp terms translate the ellipses about the complex plane.
For illustrative purposes, let us consider b 11 , taking into account the first three harmonics. Doing this, Eq. (59) reduces to (60) or (61) If we now consider the nonlinear large-signal Sparameter representation for b 11 , once again assuming a two-port network and taking into account the first three harmonics, we have Here, S ijkl are functions of all of the harmonics, not just the fundamental terms. So for any change in any a jl , a new set of S ijkl will need to be determined. Separating the real and imaginary components of the a's, we can express eq. (63) as (64) Once again, the imaginary component of a 11 is omitted, with the phase reference such that the phase of a 11 is set to zero.
We can now equate the nonlinear large-signal Sparameters of Eq. (64) to the nonlinear scattering functions of Eq. (61), with the understanding that this is only generally valid for the special case when the nonlinear large-signal S-parameters are constant for a given bias and fundamental drive level, like F kp , G kpij , and H kpij are defined. Normally, however, the nonlinear large-signal S-parameters depend upon the higher harmonics as well as on the bias and fundamental drive level. The implication of this special case will be discussed shortly, after Eqs.  Fig. 21. G kpij and H kpij serve to map a ij circles centered at zero to b kp ellipses with variable axes also centered at zero, neglecting F kp for illustrative purposes. 11 which implies b kp must be an analytic function of a ij . A complex-valued function is said to be analytic on an open set W if it has a derivative at every point of W. This is generally true only when b kp is a linear function of a ij . Thus, equating the nonlinear large-signal Sparameters with the nonlinear scattering functions is generally valid only in the small-signal, linear case. As we mentioned earlier, Eqs. (65)-(70) are only generally valid in the special case when the nonlinear large-signal S-parameters are constant for a given bias and fundamental drive level, like F kp , G kpij , and H kpij are defined. Since this is not generally true, the formulations for nonlinear large-signal S-parameters and nonlinear scattering functions are not equivalent.
We can draw a few important conclusions, however, after attempting to equate the two formulations. First, if G kpij and H kpij are allowed to be functions of higher harmonics, then only one of them, either G kpij or H kpij , or equivalently S ijkl , is required since Eq. (70) shows that they are not independent. Second, if the nonlinear large-signal S-parameters are complex constants for a given bias and fundamental drive level and are not functions of the higher harmonics, the parameters have the limitation that they cannot map circles into ellipses, but rather can only map circles into circles, as shown in Figure 22. This is because S ijkl is a single, complex constant rather than a pair of independent complex constants such as G kpij and H kpij . Thus, if S ijkl is not dependent upon higher harmonics, it acts like a linear S-parameter.