Signal and Noise Modeling of Microwave Transistors Using Characteristic Support Vector-based Sparse Regression

In this work, an accurate and reliable Sand Noise (N) parameter black-box models for a microwave transistor are constructed based on the sparse regression using the Support Vector Regression Machine (SVRM) as a nonlinear extrapolator trained by the data measured at the typical bias currents belonging to only a single bias voltage in the middle region of the device operation domain of (VDS/VCE, IDS/IC, f). SVRMs are novel learning machines combining the convex optimization theory with the generalization and therefore they guarantee the global minimum and the sparse solution which can be expressed as a continuous function of the input variables using a subset of the training data so called Support Vector (SV)s. Thus magnitude and phase of each Sor Nparameter are expressed analytically valid in the wide range of device operation domain in terms of the Characteristic SVs obtained from the substantially reduced measured data. The proposed method is implemented successfully to modeling of the two LNA transistors ATF-551M4 and VMMK 1225 with their large operation domains and the comparative error-metric analysis is given in details with the counterpart method Generalized Regression Neural Network GRNN. It can be concluded that the Characteristic Support Vector based-sparse regression is an accurate and reliable method for the black-box signal and noise modeling of microwave transistors that extrapolates a reduced amount of training data consisting of the Sand N-data measured at the typical bias currents belonging to only a middle bias voltage in the form of continuous functions into the wide operation range.


Introduction
Fast and accurate models of microwave devices and antennas are indispensable in contemporary microwave engineering.In today's RF and microwave technology, there is an ever-increasing demand for higher level of system integration that leads to massive computational tasks during simulation, optimization and statistical analyses, requiring efficient modeling methods so that the whole process can achieve the required reliability.However modeling still remains a major bottleneck for efficient RF-Microwave CAD.Among all component models, unreliable transistor models can easily lead to unsuccessful design because of their strong influence on the overall circuit performance.Thus, the efficiency of such models in terms of accuracy and speediness is critical to assure a reliable design.
Artificial Neural Network (ANN)s have emerged as the valuable tools to extend the repertoire of the statistical methods.Particularly the Back-Propagation Multi-Layer Perceptron (BPMLP)s have been employed for the nonlinear interpolation based on "learning" from the measured or simulated data, in the fast, accurate and reliable modeling of both active and passive microwave devices [1][2][3][4][5][6].Today's fast, accurate and reliable Signal S-and Noise Nblack-box model of a microwave transistor can be achieved only by a single simple BPMLP with one hidden layer which is capable of the simultaneous generalization of 12 Scattering S-and Noise N-functions into the entire operation domain of the bias condition V DS /V CE , I DS /I C and the frequency f for all the configuration types [3].However, these so-called Back Propagation Neural Networks BPNNs suffer from a number of disadvantages.The most important disadvantages can briefly be summarized as follows: (i) The variants of the back-propagation can be shown to converge to the local minima of the error surface; (ii) As most algorithms use (pseudo) random numbers, the result of training varies between runs even for identical networks and training datasets.Therefore BPNNs in modeling need statistical analysis of the results of a lot of runs.
In recent years, regression with Support Vector Machine (SVM)s and probability based neural networks named as Generalized Regression Neural Network (GRNN) remove these handicaps and facilitate to substantially reduce the expensive fine discrete training data with their typical modeling applications to transistor [7][8][9][10], integrated microstrip and slot antennas in [11], [12], printed transmission lines [13][14][15] and vertical interconnects in microwave packaging structures [16].
The SVM based on the Structural Risk Minimization (SRM) principle is one of the most widely used learning algorithms, and achieves superior generalization performance for both classification and regression problems [16].The SVM is systematic and properly motivated by the statistical learning theory [17].Training of the SVM involves optimization of a convex objective function and globally minimizes to complete the learning process without suffering anymore from local minima [17], [18].In addition, SVM can handle a large input, and can automatically identify a small subset consisting of informative points, namely Support Vector (SV)s [17].Thus sparseness of solutions is obtained that the large amount of data to be fully characterized by the set of SVs, a subset of the training set.Especially the working principle of SVRM based on the small sample statistical learning theory is utilized in [13], [14] to build a knowledge-based SVRM model for the synthesis of printed transmission lines as fast as the coarse models and at the same time as accurate as the fine models.In those works, the SVs are determined using the coarse data generator which are the empirical synthesis formulae of the related transmission line and accuracy of the models is increased by supplying the corresponding fine values of the SVs from the fine data generator which are the fullwave EM simulators.Thus, amount of the expensive fine training data is reduced substantially as pointed out in [11][12][13][14][15][16].
In this paper, we worked out to determine the characteristic SVs for each S-and N-parameter using the sparse training data, that is subject to be generalized throughout a wide range of operation domain.Thus a fast, accurate and reliable black-box S-and N-parameter models are constructed based on the sparse regression using SVRM as a nonlinear extrapolator trained by the data measured at the typical bias currents of the single bias voltage in the middle region of the device operation domain of (V DS /V CE , I DS /I C , f).This modeling method is facilitated by the two features: The first one belongs to the black-box characterization parameters which depend upon the bias currents more than the bias voltages; the second is the superior generalization ability of the SVRM.On the other hand, the training time and memory of SVM are expensive and are strongly correlated to the number of training patterns  , as O( 3  ) and O( 2  ), respectively [17][18][19].Thus approximately at least 50% of the training data is reduced that corresponds to 1/8 and 1/4 of the training time and the memory complexity respectively of the SVRM as the nonlinear interpolator in the S-and Nmodeling used in [3][4][5][6][7][8].The proposed method is implemented on the modeling of a low-noise microwave transistor ATF-551M4 and VMMK-1225 as a case study.
The paper is organized as follows: Fundamentals of the SVRM are briefly given in the next section.Third section is devoted to the case study where SVRMs are used as nonlinear extrapolators in modeling of the microwave transistors together with error-metric as compared to the counterpart method (GRNN) performance.The paper ends with the conclusion.

Support Vector Regression Machines
SVRM builds a nonlinear function between a given input and its corresponding output in the training data.This continuous nonlinear relation can be used to predict outputs for the given inputs not included in the training data.The nonlinear function is learned by a linear learning machine in a kernel induced feature space.As in the classification case, the learning algorithm minimizes a convex cost function and its solution is sparse.In order to explain the mathematical framework of the SVRMs, let us consider a training dataset: , SVRM tries to find the mapping function f( x  ) between the input variable vector x  and the desired output variable y, which are operation parameters of the device (V DS /V CE , I DS /I C , f), and magnitude and phase of each S-or N-parameter, respectively in our case.In the SVRM model, the regression function f( x  ) is expressed as follows [14][15][16][17][18]: (2.1) where ( , , ........ ) ( ) ( ( ), ( ), .... ( )) 1 2 1 2 This step is equivalent to mapping the x  input space into a new space . Thus, f( x  ) is a nonlinear function in the n dimensioned x  input space and linear function in the N dimensioned F-feature space.Dimension of the feature space N may be smaller than or equal to the dimension of the input space n.However, strategy to follow in the selection of the F-feature space is to convey the essential information of the original data into its representation in the new space.Thus, a SVRM can be constructed in two steps: First, a fixed nonlinear mapping vector ( )  x    transforms the data into a feature space F, and then the linear machine built in this feature space is used to perform regression on the data.In this manner, we will refer to the quantities w  and b in (1) as the weight vector and bias: The regression function in (2.1) is built by determination of w  and b by the strategy "seeking to optimize the generalization bounds".This is relied on defining a loss function that ignored errors that were within a certain distance of the true value.This type of function is referred to as ε-insensitive loss function.The use of ε-insensitive loss function has the advantage of ensuring the existence of a global minimum and the optimization of a reliable generalization bound [17], [18].Linear ε-insensitive loss function is given as follows [17], [18]: where ε is predefined.Equation ( 4) defines ε tube so that if the predicted value is within the tube the loss is zero, while if the predicted point is outside the tube, the loss is the magnitude of the difference between the predicted value and target with the radius ε of the tube.In order to minimize the sum of the linear ε-insensitive losses the function can be given as [17], [18]: to control the size of  w for a fixed training set.C is a parameter to measure the trade-off between the complexity and losses.Since the given objective function ( 5) is a convex, it has no local minima and it guarantees the global minimum which is one of the advantages of the SVRM on the other regression methods, especially neural networks.Substituting (4) into ( 5) and introducing two slack variables  i and î, equation ( 5) is transformed into the following soft margin primal optimization problem [17], [18]: ..., The optimization problem in (6a), (6b) can be solved more easily in its dual formulation.In this stage, a Lagrangian function L is constructed combining the soft margin primal objective function in (6a) with the constraints in (6b) and introducing Lagrangian multipliers ˆ, , , The function L has a saddle point with respect to the primal , , , w b i i    and dual positive variables ˆ, , , at the optimal solution.By substituting the saddle point conditions with respect to the primal variables into the Lagrangian function L, we have the following equivalent dual space objective function to be maximized [17], [18]: ˆˆ( , ) max L( , , , , , , , ) ..., .
In (8a, 8b) the dual variables ,   and C through the saddle point conditions.
The corresponding Karush-Kuhn-Tucker (KKT) complementarity conditions are [17], [18]: We must find  Lagrangian multipliers , i i   .For this purpose, firstly the nonzero Lagrangian multipliers are determined using the KKT conditions and substituted into (8a) and (8b), then the obtained equation is maximized with respect to the Lagrangian multipliers   in the feature space F. From (9a) and (9b) of the KKT conditions, it follows that for only the samples satisfying ( ) , the Lagrangian multipliers may be nonzero, and for the samples of ( ) , the Lagrangian multipliers vanish.Since products of i  s and ˆi  s are zero in (9c) that means that at least one of these terms is zero.The sampling data ( , ) x y  that comes with the non-vanishing Lagrangian multipliers are called Support Vector SVs, thus there can be at most  SVs.These SVs may be named as "Characteristic SV"s in our case, since they facilitate to express a transistor characteristic parameter which is either signal or noise parameter, analytically in a wide range of the operation domain.Then the mapping function ( ) between the input variable space and the desired output variable can be expressed in terms of the SVs within the kernel ( , ) where Radial Basis Function (RBF) is chosen as the kernel ( ,   function in the case study and n SV is the number of the characteristic SVs ( SV n   ) and b is determined using KKT conditions (10a) and (10b).Thus we have for ( ) f x  an analytical function mapping the n-dimensioned xinput space and one-dimensioned y i -output space, in terms of n SV characteristic SVs within the RBF kernel domain as follows: where n SV is number of the characteristic SVs obtained by applying the ε-tube SVRM selection process into the training data set ( , ), 1, 2,.....
 .In the next section a small-signal transistor will be characterized as a black-box for use in small-signal amplification.In Sec. 4, as a case study, ( )  will be obtained for magnitude and phase of each Scattering (S-) or Noise (N-) parameter for typical LNA transistors ATF551M4 and VMMK1225 using their manufacturer's data belonging to the typical currents of only a single bias voltage in the middle region of the device operation domain of (V DS , I DS , f).Furthermore error-metric analysis of both of the two transistors will also be given as compared to the counterpart method GRNN which has also been worked out for the transistor modeling by our research group in [8][9][10].

Black-Box Characterization Parameters
The black-box representation of a small-signal transistor with its terminations and port reflections is shown in Fig. 1.Here signal and noise performance of a small-signal transistor are given by scattering S  (V DS , I DS , f), noise N  (V DS , I DS , f) functions in the device operation domain of bias condition (V DS , I DS ) and frequency f.Thus the measured S  , N  data at the discrete frequencies throughout the operational band at a bias condition (V DS , I DS ) can be arranged in a table-form function as follows: ( . where .....
where i is each data sample,  is phase value for the selected S/N parameters,  is magnitude value for the selected S/N parameter.
In this work, the S  , N  data defined by (12.1), (12.2) belonging to the typical currents of only a single bias voltage in the middle region of the device operation domain of (V DS , I DS , f) is found to be sufficient for training SVRM.Then, the characterization vectors S  (V DS , I DS , f), N  (V DS , I DS , f), at any desired frequency f of any bias condition (V DS , I DS ) can be obtained from the network output by inputting that (V DS , I DS ) and the frequency f.Thus the transistor under consideration is characterized continuously all throughout the device operation domain using greatly reduced amount of sampling data.

Performance Measure Functions
Once the characterization , S N   functions are determined, the signal and noise performance of that transistor are known for any input and output ( S ,  L ) termination couple at any operation condition (V DS , I DS , f).There are four functions measuring the performance of the smallsignal transistor (Fig. 1) [2]: (1) Transducer power gain G T of an active device is defined as the ratio of the power delivered to the load P L to the available maximum power P AVS from the source that is the function of the ( S ,  L ) termination couple and S  parameters as follows: (2), (3) M in and M out are mismatching functions that measure net powers entering into the input and load, respectively.Their dependences to the device signal S  and the termination ( S ,  L ) couple can be given as follows: where the input  in and output  out reflection coefficient are given in terms of the device signal S  and the termination ( S ,  L ) couple: ( , ) 1

S S S S S
(4) The noise figure F of an active device is defined as the ratio of signal-to-noise ratios available at input and output; the noise vector N  describes the dependence of the

Case Study
In this work, the proposed method is implemented to modeling of the signal and noise parameters of the two transistors ATF551M4 and VMMK 1225 and the library LIBSVM for Support Vector Machines is employed [20], [21].ATF551M4 previously has been modeled using SVRM in the interpolation process in [7].On the other hand, VMMK1225 is properly selected since it is a typical LNA transistor with the given manufacturer's data in a large operation bandwidth of 2-45 GHz at the bias currents from the lower range up to the upper range of I DS = 5, 10, 15, 20 mA of the bias voltages of V DS = 2, 3, 4 V [22].
Firstly let us consider ATF551M4.The 24 S-data for ATF551M4 transistor are supplied ranging in the 0.1-18 GHz at each of the bias (V DS , I DS ) condition where the bias current is given as I DS = 10, 15 and 20 mA of each bias voltage of V DS = 2, 2.7, 3 V by the manufacturer's data sheets.Nevertheless, 15 N-data are supplied ranging in the 0.5-10 GHz at each of the same bias (V DS , I DS ) conditions.In Tab. 1, the training and test data sets are given for the SVRM modeling with respect to the bias condition.The data belong to 2.7 V DC bias condition are given to SVRM model for training purpose only, while 2 and 3 V DC bias data are used after training process for test/validation purpose.Thus by this means, the total measured data provided by manufacturer datasheets is separated in two different datasets for training and test process.In Tab. 2 number of characteristic SVs is given for each SVRM model of the magnitude and phase of each S-and N-parameter and the spread parameter σ of the radial kernel function in (11) is taken to be equal to 0.1 for the lowest Mean Absolute Error MAEs using (17) and (18) taken place in Tab. 3, 4. As seen from the results given in Tab. 3, 4, the overall performance of SVRM model is higher than its counterpart method GRNN, particularly in S-parameter domain, where the variations of parameters are more complex than N-parameters for same DC bias conditions over the operation frequency bandwidth.Comparative Error Metrics in Tab. 3 and 4 use the definitions given by ( 17) and ( 18) in calculation of Mean Absolute Error MAE and Relative Mean Error RME for both the SVRM and GRNN methods: where P i and T i are the ith predicted and target phasor value, respectively and N is the sampling number for the validation of the SVRM or GRNN model.In Figs. 2, 3, the predicted scattering parameters obtained from the Characteristic SV-based sparse regression are compared on the Smith chart and polar plane with the target values for bias conditions (2 V,10 mA) and (3 V, 20 mA), respectively.Furthermore in Figs. 4 and 5, predicted results of the typical parameters S 11 phase and S 21 magnitude are compared with all the test samples belonging to the bias voltages V DS of 2 V and 3 V, respectively.total measured data provided by manufacturers datasheets is separated in two different datasets for training and test process.The reduced number of characteristic support vector corresponding to each scattering and noise parameter is given in Tab. 6.As seen from the results given in Tab. 7, 8, similarly to the case of modeling of ATF551M4, again for VMMK1225 the overall performance of SVRM modeling method is higher than its counterpart method, GRNN.Thus, one can infer that S-and N-parameter modeling based on Characteristic Support Vector-based Sparse Regression, SVRM is a fast and reliable modeling method requiring substantially reduced measurements and human effort.
Tables 7 and 8 give the error metrics of S-and Nparameters compared with the values resulted from the counterpart method GRNN respectively.In Figs. 7, 8  samples given in Tab.14, 15 fit snugly into targeted values.Furthermore in Figs. 9 and 10, predicted results of the typical parameters S 11 magnitude and S 21 phase are compared with all the test samples belonging to the bias voltages V DS of 2 and 4 V, respectively.In Fig. 11, the predicted noise parameters of VMMK1225 using the characteristic SV-based sparse regression are compared with the target values for bias condition (4 V, 20 mA).Once again, the obtained results suggest that the sparse modeling of microwave transistors with SVRM method is an effective and efficient method for transistors S-and N-parameters.

Conclusion
SVRMs are novel linear learning machines in the kernel-induced feature space that combine the convex optimization theory with the generalization, thus they guarantee the global minimum in the optimization procedure.Furthermore the solution has sparseness, in other words the solution can be expressed as a continuous function in terms of an informative subset of the training data so called Characteristic Support This sparseness of solution facilitates a fine and fast sparse regression for suitable regression problems such as modeling of microwave transistors; at the same time it also provides a significant tool to characterize the large amount of data.These two features provide SVRMs distinctive superiorities to the other classification and regression methods, such as commonly used BPNN methods.
In this work first time in the literature, SVRMs are employed successfully as nonlinear extrapolators in the black-box modeling of Scattering and Noise parameters of a microwave transistor as counterparts to the Generalized Regression Neural Network GRNNs [8][9][10].Thus this work can be considered mainly in the following significant contribution in the transistor modeling: Magnitude and phase of each characterization S-or N-parameter can be expressed as a continuous function in the throughout device operation domain of (V DS , I DS , f) using only a subset of the reduced training data so-called Characteristic SVs.In the modeling process, the data measured at the typical currents of only a single bias voltage in the middle region of the device operation domain of (V DS /V CE , I DS /I C , f) is found to be sufficient for training SVRM.Furthermore a detailed error-metric analysis is made between the two nonlinear extrapolators SVRM and GRNN.From this comparative error-metric analysis, it can be observed that SVRM is superior to the counterpart GRNN as the extrapolation performance, furthermore the SVRM extrapolation results in an analytical expression while GRNN predicts on the probability bases.It can be concluded that in this work a revolutionary approach is put forward in the transistor modeling since Characteristic Support Vector based-sparse regression results in an accurate analytical expression using a great amount reduction of training data and human effort.

Fig. 1 .
Fig. 1.Black-box representation of a small-signal transistor and its port impedances.transistor noise figure F on the input termination (source) reflection coefficient Г S .These are linked through the following relationship: 2 S o p t N S m i n 2 2 0 S opt ( , ) 4 (1 ) 1

Fig. 4 .
Fig. 4. S 11 phase throughout the entire test data domain of ATF551M4.

Fig. 5 .
Fig. 5. S 21 magnitude throughout the entire test data domain of ATF551M4.In Fig.6, the predicted Noise parameters of ATF551M4 using the Characteristic SV-based sparse regression are compared with the target values for bias condition (2 V, 10 mA).Similar to modeling of ATF551M4, the Characteristic Support Vector-based Sparse Regression is also implemented to the modeling of another LNA transistor VMMK1225 within a large operation bandwidth of 2-45 GHz at the bias currents from the lower ranges up to the upper range of I DS = 5, 10, 15, 20 mA of the bias voltages of V DS = 2, 3, 4 V [22].692 data belonging to the typical bias currents of 5, 10, 15, 20 mA at the middle bias voltage 3 V are used for the S-parameter training and the model is tested the rest 1384 data as given in Tab. 5.The N-parameters of the VMMK1225 are given within the range of 2-17 GHz frequency in the manufacturer's datasheet.Similarly the N-parameter model is built by training with the central 64 data and tested with the rest 128 data as in Tab. 5. Similarly in case of modeling of ATF551M4, the data belong to 3 V DC bias condition are given to SVRM model for training purpose only, while 2 and 4 V DC bias voltage values are used after training process for test/validation purpose.Thus by this mean, the

V DS /I DS 10 mA 15 mA 20 mA
Training and test data set for ATF551M4.
Training and test data set for VMMK-1225.Number of the characteristic SV in modeling of VMMK1225.
Tab. 7. Error metric of scattering parameters for SVRM extrapolation of VMMK1225.Tab. 8. Error metric of noise parameters for SVRM extrapolation of VMMK1225.