A Generic and Efficient Globalized Kernel Mapping-Based Small-Signal Behavioral Modeling for GaN HEMT

The work reported in this article explores a novel Particle Swarm Optimization (PSO) tuned Support Vector Regression (SVR) based technique to develop the small-signal behavioral model for GaN High Electron Mobility Transistor (HEMT). The proposed technique investigates issues such as kernel selection and model optimization usually encountered in the application of SVR to model the GaN based HEMT devices. Here, the PSO algorithm is utilized to find the optimal hyperparameters to minimize the fitness function. To enumerate the efficiency and the generalization capability of the predictors, the performance of the model is investigated in terms of mean square error (MSE) and mean relative error (MRE). A very good agreement is found between the measured S-parameters and the proposed model for multi-biasing sets over the complete frequency range of 1GHz-18GHz. The proposed technique is even used to test the frequency extrapolation capability of the model. A comparative analysis indicates that the proposed PSO-SVR predictor achieves significantly improved computational efficiency and the overall prediction accuracy. To demonstrate the ready usefulness of the modeling approach, the developed model has been incorporated in CAD environment using MATLAB Cosimulation in ADS Ptolemy. Subsequently, the small-signal stability analysis is performed and gain of a power amplifier configuration designed using the proposed GaN HEMT model is determined.


I. INTRODUCTION
Gallium Nitride (GaN) HEMT is getting very popular in the design of circuits and components such as RF Power Amplifier (PA) owing to its unique traits like wide energy bandgap, high breakdown field, and high electron mobility [1]- [7]. However, its use in RFPAs necessitates the development of specification-based large-signal models. It has been established that an accurate large-signal model of GaN HEMT utilizes a bottom-up modeling technique where small-signal models are extremely important [8]- [12]. A number of GaN The associate editor coordinating the review of this manuscript and approving it for publication was Yin Zhang . HEMT small-signal modeling techniques exist and many of these make use of analytical formulations, cold pinch-off concept, and de-embedding [11]- [21]. These conventional methods, although accurate, are often found to be highly cumbersome and computationally inefficient. Therefore, the alternative machine learning (ML) based small-signal modeling technique is gaining popularity as their turn-around time is fast with very good accuracy [22]- [23]. A key feature of ML is its ability to predict the outcome in real-time very quickly and this is very appealing for device modeling especially at RF and microwave frequencies where the inter-dependence of various device parameters on each other is huge [24], [25]. Furthermore, device modeling by integrating various device geometry and learning device behavior with more features to build an automation model using numerous ML algorithms has the potential to bring a paradigm shift in the way device modelling is carried out. It is due to the fact that the ML based modelling discards the need to solve simultaneous complex equations involved in conventional methods, possesses reduced computation time, exhibits better accuracy and prediction ability, and enhanced production yield [26].
More recently, Artificial Neural Network (ANN) and Support Vector Regression (SVR) techniques have been used in the development of small-signal model for GaN HEMT [27]- [34]. However, SVR is considered better than ANN as it enables the determination of global optimum which is superior to ANN that supports local optimum [23], [25], [32]- [37]. Moreover, the SVR possesses geometrical interpretation whereas ANN is based on the tedious description of various parameters. In addition, SVR also exhibits robustness to noise, and excellent generalization capability (providing solution to underfitting and overfitting) and is therefore preferred over ANN in the device modeling regime. At this instance, it is important to note that the SVM based regression model also suffers from multiple issues in SVR [32]- [37]. Some of the issues can be attributed to selection of inappropriate kernel function and random set of hyperparameters which leads to poor prediction ability of the model. Apparently, the selection of kernel function depends on dataset and an uncertainty in dataset makes it difficult to determine the specific kernel function which is most suitable for model development. On the other hand, set of hyperparameters if not selected properly may lead to underfitting or overfitting. In addition, SVR is not appropriate while dealing with massive dataset due to large processing Gram matrices associated with kernels [25].
Keeping the above issues in perspective, therefore, a novel technique making use of Particle Swarm Optimization (PSO) and SVR is developed to model behavior of GaN HEMT under small-signal conditions. First, a kernel function library is constructed so that an optimized kernel function could be selected according to the problem requirement. Subsequently, PSO is utilized for superior optimization considering that the performance of SVR depends on its kernel function and hyperparameters. The PSO is used here due to the fact that it possesses superior features such as simple search rules, easy implementation, less parameters to adjust, no effect of initial conditions on its computational behavior and fast convergence in comparison to other techniques such as the Genetic Algorithm (GA) and the resampling technique. In addition, the PSO ensures convergence to the global optimum instead of local optimum observed in GA and other local search techniques. Furthermore, the PSO is a a computational population based algorithm which evaluates the fitness function at each particle (collection of individuals) as particles move in steps throughout the region. Eventually, the algorithm evaluates the new position and velocity of the particle and iterative procedure continues as the particle moves further and algorithm re-evaluates. Overall, the main contributions of this article are: (a) development of a novel PSO-SVR based small signal behavioral model for GaN HEMT device, (b) validation of the prediction ability of the model by frequency and geometric extrapolation, (c) robustness evaluation of the proposed model by mixing random noise, and (d) the integration of the proposed transistor model in CAD environment for circuit simulation and analysis as a part to demonstrate the usefulness of the model. The next section elaborates the basic SVR method and PSO-SVR model whereas development of the framework for GaN HEMT under small signal is explained in Section III. The experimental results, model validation, and comparison with the state-of-the-art are presented in Sections IV and V respectively while Section V concludes the paper.

II. SUPPORT VECTOR REGRESSION & PARTICLE SWARM OPTIMIZATION ALGORITHM
A. SUPPORT VECTOR REGRESSION (SVR) Support vector machine (SVM) was invented in 1963 by Vapnik and Chervonenkis (VC) and developed over three decades. Eventually, B. Boser, I. Guyon and V. Vapnik proposed an approach to create non-linear classifiers by application of kernel trick to maximize the margin of hyperplane in 1992. It is a unique class of algorithm characterised by application of kernels, sparseness of the solution, presence of global minima, and capacity controlled by altering margin. SVR, a regression technique, works on the principle of Support Vector Machine (SVM) but with few minor differences. Being a regression problem, it uses the curve to find the match between vector and position of the curve instead of decision boundary in SVM. The SVM is one of the non-parametric techniques based on computation of linear function in a high dimensional feature space. Here, the inputs are mapped through a nonlinear function called kernel. For example, if x i is a multi-variable set of predictors of N observations with response values y i for a given set of training sample then the idea behind SVR is to find a function f (x) that maximally deviates by epsilon, ε, from the observed targets y i for complete set of observations and ensure to be as flat as possible as illustrated in Fig. 1 [38], [39]. To comprehend this aspect, let us consider a linear function f (x) given as where ψ(x) is a mapping function that maps each input vector x in the input space X = (x) in the high dimensional feature space with bias vector w, T is the transpose and b ∈ R is a bias value. This study focuses on ε-SVR model. This method uses a concept of ε-insensitive loss function where the training data has at most ε deviation from targets y i [38]- [42]. Loss function calculates the distance between the observed value y and the ε boundary by treating those error equal to zero that lie within ε distance of the observed value.
To address the training samples that lie beyond the ε-insensitive zone, slack variables ξ i and ξ * i at each point can be introduced as the soft margin classification. Therefore, the primal formula is restated in (2) and the applicable constraints in (3). Then the dual formula can be framed by constructing Lagrangian function from the given set of primal function and equivalent constraints by introducing positive multipliers. The terms α m , α * m , η m and η * m in (4) are nonnegative Lagrangian multipliers [39], [40]. Then the partial derivates of (4) with respect to primal variables (w, b, ξ i , ξ * i ) to zero for optimality and substitutions of those values back in (4) yields dual formula expressed in (5). It leads to expression (6) and which needs to satisfy the complementarily Karush-Kuhn Tucker (KKT) conditions expressed by equations 7(a)-(c) [39]- [41]. Now, the best estimated function f (x) or the optimal solution of the convex optimization can be derived by solving (5) and (6) using the famous Sequential Minimal Optimization (SMO) algorithm expressed by (7). Here, K (·) denotes the kernel function, x j is the support vector, n is the number of support vectors, the term in bracket represents weight coefficient of support vectors [41], [42], and b is the bias.
Different kernel functions implicitly map the predictors to a high dimensional feature space and capture the nonlinearity of the model or the system. The well-known widely used kernel functions with their range are included in Table 1. The parameters such as q and γ are kernel parameters. The term q, an integer quantity, determines the degree or order of a polynomial kernel whereas γ = 1 2σ 2 , plays the role of scaling parameter for Gaussian or Radial Basis Function (RBF) kernel.

B. PARTICLE SWARM OPTIMIZATION ALGORITHM
The PSO is an evolutionary computational algorithm based on population [15], [43]- [45]. A collection of individuals known as particles move throughout the region. The optimization process begins with randomly initialized population of solutions which are called particles. The swarm consists of n particles assigning initial velocities to each particle. It then, assesses the fitness function at each particle's location, and determines the best value and the position. Subsequently, it selects modified velocities, based on current velocity, best position of particle's individual and its neighbors. The particle's and its neighbors' locations and the particle's velocities is updated iteratively until the algorithm reaches a stopping criterion.
Let the swarm consists of n particles, with each particle position defined by a vector x i = (x i,1 , x i,2 , . . . , x i,d ), each particle velocity defined by a vector v i = (v i,1 , v i,2 , . . . , v i,d ) where i = (1, 2, . . . , n), and d the number of dimensions of each vector. It is pertinent to mention that the PSO has two operators namely the position operator and the velocity operator. During each generation, the best position of the particle is denoted by the vector P best,i = (p i,1 , p i,2 , . . . , x i,d ) and global best position of the population is represented by the vector P best,g = (p g,1 , p g,2 , . . . , x g,d ). The updated position and velocity of the particle can be respectively obtained using the following expressions where w represents inertia weight, c 1 and c 2 are acceleration constants whereas r 1 and r 2 are the randomly generated values in the range [0, 1]. The inertia weight affects the current velocity whose value decreases linearly with successive generations. Higher values of inertia weight improve global search while lower values facilitate local search. The linear decreasing inertia weight (LDIW) proposed by Y . Shi is represented by (10) is used for the proposed approach.

III. DEVELOPMENT OF PSO-SVR BASED GaN HEMT BEHAVIORAL MODELLING TECHNIQUE A. DEVICE LAYOUT AND CHARACTERIZATION
The fabricated AlGaN/GaN HEMT device, not shown here due to manufacturer's policy, of geometries 2 × 200 µm and 4 × 100 µm is shown in Fig. 2. These devices, grown on SiC substrate, are fabricated by Defense Research Development Organization (DRDO, an enterprise of ministry of Defense, Govt. of India). The characterization set-up includes Microwave Network Analyzer, probe station and DC power supplies. The biasing to the device are provided at the gate and the drain terminals, with the source terminal being grounded, through DC power supplies using bias Tees internal to the network analyzer. The characterization is done under the small-signal 10 V for a frequency range of 1-18 GHz. The S-parameters are recorded in terms of magnitude and phase of two-port S-parameters for the provided multi-biasing and frequency range. The RF-characteristics of any transistor can be studied in terms of unity current gain frequency (f t ) and maximum oscillation frequency (f max ). The f t is extracted by converting S-parameters to current gain h-parameter h 21 and putting it equal to unity whereas f max is extracted by putting the maximum unilateral gain equal to unity. As seen from the Fig. 3, the maximum value of f t and f max obtained for class A and class AB operated 4 × 100 µm GaN HEMT is approximately 21 GHz and 52 GHz respectively using the expression defined in [46].
The I D -V DS characteristics of GaN HEMT device characterized under DC operation is shown in Fig. 4. DC I-V characteristics is used to classify the regions of the transistor as ohmic, saturation and cold pinch-off. Under DC operation, the gate bias is stepped gradually up from pinch providing enough time for trapped electrons to response unlike to the pulsed measurement where the device is first pinched-off and electrons are injected into traps. When channel is turned on by a short pulse under high electric field, the trapped electrons are unable to response in time. Therefore, the current is reduced, knee voltage is increased and the predicted output power reduces.

B. PREPROCESSING
Modeling of small-signal behavior of GaN HEMT requires mapping of five predictor variables into high dimensional VOLUME 8, 2020  2 and N = 4, respectively) and unity gate width wg (µm) (200 µm and 100 µm, respectively). On the other hand, eight desired responses are two port S-parameters, S 11 (reflection coefficient measured at port 1), S 22 (reflection coefficient measured at port 2), S 12 (transmission coefficient measured at port 1) and S 21 (transmission coefficient measured at port 2) recorded for different sets of inputs in the form of magnitude and phase.
It is imperative to note that multiple transitions are observed in the measured phase values of S 11 and S 22 , due to few multi-bias condition usually at low V GS and high V DS are shown in Fig. 5, considering that the phase values of S-parameters are recorded by the VNA in the range of [−π,+π ]. However, no such anomaly is observed in the magnitude of S-parameters as these are recorded on a linear scale. This situation necessitates pre-processing of data. It is well-known that data preprocessing is an important unit of machine learning based model as shown in Fig. 6, and it is employed for cleaning and smoothing of the data before training the model. This stage deals with the issues like presence of inconsistency, outliers and random noise present in the dataset. Scaling the data is also one of the steps of the preprocessing required as per the requirement of the regression problems.
In this article, the magnitude of two port S-parameters has been scaled logarithmically to ensure the data-set must be a uniform distribution of data within small range which therefore will expedite the search of global optimum and fast convergence. It is well observed that phase of S 11 and S 22 is highly discontinuous curve past 11 GHz. Trigonometric adjustment for positive phase angles (θ new = θ − 360 0 ) is made to restore the data consistency. Subsequently, dataset is moved to outlier detection unit. An outlier is an unwanted part that behaves differently with another portion of dataset. Outliers play a major role and have the capability to develop poor model. Margin will shrink and sub-optimal decision boundary may result in poor prediction capability of the model or issues like underfitting and overfitting may arise. This may affect regularization constant C, as the outlier affected model may generate more errors, converting softmargin SVM problem to hard-margin SVMs.
In this work, the condition for detection of an outlier, expressed in (11), is based on statistical (stochastic) assumption that data is normally distributed and therefore idea is to learn a model fitting the provided dataset and therefore objects detected in error region of the model are termed as an outlier.
where erfc −1 (x) = erfc −1 (1 − x) and known as inverse complementary error function. The error function is obtained by integrating the normalized Gaussian distribution. The normal distribution function N (x) gives the probability that a variate assumes a value in the interval of [0, x] Subsequently, the outliers detected are filled using linear interpolation method of adjacent non-outlier values. The phase variables are scaled in the range [0.01, 1] using (13) where c max is the maximum value of the defined range, c min is the minimum value of the defined range, b is the value to be scaled, b max is the maximum value of the response variables to be scaled, b min is the minimum value of the response variables to be scaled. Past pre-processing, the data can now be utilized to train the model.

C. TRAINING OF THE MODEL: SVR HYPER-PARAMETER OPTIMIZATION USING PSO
The flowchart for PSO-SVR based behavioral model of GaN HEMT is shown in Fig. 7. This section discusses the training stage of the model development. Two-third of the data is utilized for training of the model while the rest one-third is reserved for testing the prediction ability of the model. Two devices with distinct geometries having different number of The remaining 5 GHz is reserved for testing the frequency extrapolation capability of the model. The main idea here is to develop a small-signal behavioral model of GaN HEMT that utilizes optimal kernel function to transform the data into high-dimensional feature space that aids in linear separation of the optimal hyperplane (maximum margin). Effective kernel transformation, an architecture shown in Fig. 8, depends on couple of critical factors, also known as hyperparameters, such as penalty factor C, margin of tolerance (epsilon width) (ε) and kernel scaling parameter (q and γ ) which eventually decides optimal hyperplane. The overall optimization problem is a combination of achieving the best fit of the model subjected to the condition that most of the samples should stay outside the margin. Therefore, regularization parameter C is a user defined input, tuned in a manner that margin neither completely overfit nor loses its large margin property. An optimization problem is modified such that it optimizes the fitness function by penalizing the number of samples inside the margin at the same time. Here, C defines the weight of samples inside the margin resulting in contribution to an overall error. Apparently, a lower value of C penalizes less sample as compared to higher values of C. The other vital parameter for defining the support vector regression is epsilon (ε) which is defined as a margin of tolerance where errors are not penalized. Support vectors (SVs) are instances across the margin where samples are penalized for which slack variables (ξ i and ξ * i ) are non-zero as discussed in Section II. It is to be noted that a large value of implies more errors admitted in the solution. It implies that a greater number of SVs are required if every error is penalized. Moreover, the epsilon-insensitive loss function ensures existence of global minima and optimized reliable generalization bound [38].
Eventually, the goal is to develop a model with minimization of fitness function, an optimal hyperplane (better generalization bound) which maximizes the margin if part of error is tolerated. The Support Vector Regression (SVR) is regression problem and hence it becomes very difficult to obtain optimal set of kernel functions and hyperparameters that will achieve the desired objective stated above by employing hit and trial method which consists of infinite possibilities. In the present work, therefore, Particle Swarm Optimization (PSO) algorithm is used to obtain and select optimal kernel and associated hyperparameters. Here, optimal refers to the function and hyperparameters that yield the best fitness function for the model. Owing to attributes such as excellent interpretability, easy implementation and fast convergence, the PSO is preferred over other algorithms. Flow of the PSOtuned based SVR algorithm, depicted in Fig. 6, is described in the subsequent steps.
Step 1: The parameters of PSO such as accelaration coefficients (c 1 , c 2 ), and randomly generated coefficient (r 1 , r 2 ) are initialized. Later, the hyperparameters, that needs to be optimized, are generated in feature space. Each i th particle is represented by {C, σ, ε} and kernel functions. The initialization parameters and particle position range are defined in Table 2, whereas p best is assigned as the initial population. A wide range of particle's range is chosen to ensure the precision in selection of optimal hyperparameters. Step 2: The hyperparameters, namely C, σ , and ε, are set as optimized variables. For this purpose, the fitness function defined in terms of mean square error (MSE) expressed by (14) needs to be evaluated. Here, N is sample size, y meas represents the measured value and y pred is the predicted value. Each hyperparameter (i.e., particle) is a potential solution to the problem in search space. The particles share information among each other and tend to shift towards promising search region by adjusting their search direction. The hyperparameters have their own optimal experience represented as best position of particle p best,i in the feature space and optimal experience derived from population is represented as best position from the population as p best,g . Apart from the performance criteria in (14), the proposed model performance is also evaluated in terms of correlation coefficient, R, given in (16). The model fits better for R closer to 1.
Step 3: At every optimization step, each hyperparameter evaluates its own fitness as well as fitness of its neighbouring hyperparameters. During this process, each set of hyperparameters does accelarate towards p best,i and p best,g . Subsequently, the algorithm evaluates and searches the minimum value of MSE. If the value of MSE for given set of hyperparameters i for the current iteration surpasses that of p best,i , then p bestnew,i replaces p best,i . The MSE is evaluated correspondingly for each set of hyperparameters and the best MSE value of for a set of hyperparameters is chosen among all set of hyperparameters are considered as global best. Essentially, p best,g replaces p best,i , if the MSE value of p best,i is greater than that of p best,g .The hyperparameters are obtained for which the minimum value of fitness function is achieved.
Step 4: The velocity and the position of a particle are updated according to (9) and (10) respectively.
Step 5: The optimization process is carried out in sequential manner as mentioned above until the maximum number of iterations is reached or the objective function attains constant value for about next 30 iterations. In this work, maximum number of iterations is set to 1000.
Subsequently, the proposed GaN HEMT model is developed using a distinct algorithm called Sequential Minimal Optimization (SMO) [25] for optimal set of kernel function and associated hyperparameters considering various additive noise functions. The SMO algorithm has a unique feature of solving quadratic programming (QP) without any numerical optimization steps and even without extra matrix storage. Often, it consists of couple of stages in which large QP problem is decomposed into subsets of small QP problems. The optimized parameters for the proposed GaN HEMT behavioral model is listed in Table3.

D. EXPERIMENTAL DISCUSSION
In this section, series of experiments are conducted to investigate the effect of noise on the proposed behavioral model of transistor. The proposed model is exposed to different types of noises, gaussian and non-gaussian, to identify their effect and to elaborate the robustness of the proposed approach. As a fact, it is well-known that the conventional lumped element based electrical equivalent model of any device contributes noise of different types and levels at different frequencies when integrated in CAD tool for circuits and systems design. Considering noise as a random process, the study assumes that the proposed model is corrupted by an additive noise of different nature, defined as Gaussian noise and Rayleigh symmetric (non-gaussian) noise densities, in (16) and (17) respectively.

1) GAUSSIAN NOISE FUNCTION
The proposed model with same value of obtained optimal hyperparameters in Table 3 are further trained assuming that the proposed model is corrupted by noise term N contributing to it additively, obtained from the probability density function (PDF) known as Gaussian distribution as where µ and σ 2 denote mean and variance. respectively.

2) RAYLEIGH DISTRIBUTION FUNCTION (NON-GAUSSIAN)
Similarly, another noise function known as Rayleigh distribution function is considered to test the effect of noise on the proposed model. A continuous distribution non-Gaussian function with shape parameter σ is defined using probability density function for x > 0 as The motive behind conducting this experiment is to initially analyze the performance of the proposed behavioral model of GaN HEMT in various noisy environment and its effect on prediction ability, accuracy, generalization capability of the model and various training parameters such as MSE, loss function, support vectors. In this work, the proposed model is influenced by noise. As a data based model, random Gaussian samples are generated for couple of mean and variance sets of {0, 1} and {2, 5}, whereas Rayleigh samples are generated for shape parameter (b) and variance set of 1,1 and 2,5. Overall, 11520 training samples and 5760 testing samples are generated to perform this experiment.
The proposed model is re-trained and now six predictor variables (including noise) is mapped to feature space with the same set of optimized hyperparameters obtained using PSO in Table 3. It is worthwhile to mention that the initial conditions of all the training are stored and this ensures that the performance of the model is enumerated and assessed under the noise environment without further training.  The trained model is then tested under noiseless condition, and also with the incorporation of Gaussian noise and Rayleigh Distribution for different sets of mean and variance. Then the training parameters such as training error, loss function, and support vectors are enumerated and compared between the noiseless model and the proposed model under noise in Table 4 and Table 5. It can be clearly inferred that the increasing variance of noise samples increases the support vectors slightly even though one can see reduction in the training error. The increase in the support vectors indicate that the regularization parameter C is penalizing more errors and this leads to change in the generalization bound of the model due to the noise and this is obvious for a pre-trained model. The overall impact of this is the reduced generalization capability of the model which may result in few degree of data overfit. Furthermore, almost every error is penalized with less errors admitted in the solution due to low epsilon and this indicates a good generalization capability of the model under noiseless condition.
The section further calculates and discusses the prediction ability (accuracy) and generalization capability of the model under noiseless and various noise functions with different mean and variance for complete test set. A very good MSE for the test set of the order of 10 −4 , which is in close proximity to the MSE of training set, and correlation coefficient (R) of over 99% is obtained for test samples as shown in Table 6. These results demonstrate a good fitting and generalization over the complete range for noiseless as well as noisy environment. Apparently, the prediction ability of the proposed model for the test set decreases by around 1.5% in the presence of noise over the complete sample and this can be considered very good outcome thus confirms that the proposed model is highly immune to noise effects.
Finally, the proposed small signal modeling approach of GaN HEMT is also compared with the conventional smallsignal behavioral model approach for GaN HEMT developed using Artificial Neural Network (ANN). Similar to the work in [29]- [31], the ANN based small-signal behavioral model is developed for the same set of predictors and responses discussed in Section III-B. The desired responses are scaled using the expression in (14). In this context, a feed-forward three layer Multi-Layer Perceptron (MLP) based ANN model is trained using Levenberg-Marquardt algorithm with 15 neurons, tan-sigmoid as an activation function, and initialization of weights and bias, as discussed in [31], with the same objective function defined in (15) and (16). The objective function tends to minimize the objective function for optimal set of weights and biases. Then the ANN based network is further trained and developed with additional noise inputs described in (17) -(18) using the similar optimal set of weights and biases obtained in noiseless environment. To validate the performance of ANN based model, the trained network is subjected to novel test inputs. The prediction ability of the model is enumerated for noiseless and Rayleigh noise and the obtained results are given in Table 6. It is quite evident that the MSE of phase variables are significantly increased, of the order of 10 −3 and 98%, for the ANN based small-signal behavioral model when compared to the proposed SVR based modeling approach under similar Rayleigh noise. The MSE and R of test set for ANN model is not close enough to the training set and is also greater than the proposed SVR model which signifies that there is degradation in the prediction and generalization capability of the ANN model.
The excellent performance of the proposed model is due to the fact that the SVR works on the principle of structural risk minimization. Here, the parameters such as C and epsiloninsensitive zone keeps regular check on the generalization capability of the model and loss function complemented with the PSO ensure global optimum. In contrast, the ANN works on the principle of empirical risk minimization. In the ANN, the primary objective is to only minimize the errors and it relies on local optimum. In addition, the ANN does not have in-built provision of keeping regular check on generalization capability. This is the reason that ANN may perform well on training set but is unable to validate and generalize the performance for test set as well. Furthermore, more number of parameters are required to tune and optimize the performance of ANN based model when compared to the proposed SVR model that relies on just three parameters.
As an experiment, the performance of the ANN based model can be improved by increasing the number of hidden layers and tuning several parameters. However, the generalization capability and the performance under non Gaussian noise still remains an issue with the conventional ANN based modeling technique. Moreover, it is pertinent to note that the performance of ANN based small-signal behavioral model can be further improved by increasing the number of hidden layers and neurons within it and selecting more appropriate activation function. In that case, the device engineer needs to consider trade-off with other factors such as modelling and computational complexity and convergence rate. At the same time it is worthwhile to mention that such an issue doesn't arise in the SVR based technique considering that global optimization in this technique is achieved by tuning of either two or three hyperparameters.

IV. MODEL VALIDATION
In the previous section, framework for the development of behavioral modeling of GaN HEMT is established and the model is trained with optimal hyperparameters and kernel functions. This section investigates the reliability and effectiveness of the trained model by subjecting the model to novel set of inputs for broad frequency range from 1-18 GHz (with the frequency extrapolation) including noise and without noise utilizing one-third of measured data reserved for testing and validation purpose. To study the accuracy of the model more precisely, the prediction ability of the proposed model is enumerated in terms of mean relative error (MRE), expressed in (18), where y meas is the measured value and y predicted is the predicted value of the response variable of the proposed model. The relative error is plotted against the whole frequency range for four different set of multi-bias and different operating region of the transistor under noiseless condition as depicted in Fig. 9 and Fig. 10 respectively. Similarly, MRE calculated for 2 × 200 µm GaN HEMT for different multi-bias condition over complete frequency range is given in Table 7.
The calculated MRE shows very less error within the range of training frequency (upto 13 GHz), and a rise in spike from 2-5% at the the start of extrapolation set before the error starts decreasing and curve settles down. Altogether, with the mean relative error varying between 1.5-3.5% for extrapolation set shows a very promising accuracy of the proposed model. At this moment it can be seen that very high MRE in the plot of S 21 in Fig. 10(a) occurs. It can be attributed to the high value of C and σ in Table 3. The large value of  σ signifies that support vectors are widely spread and this negatively influences the flatness of the curve. On the other hand, the large value of regularization parameter C penalizes the errors too much which in turn increases the number of support vectors. As a consequence, the response variable S 21 shows a low training error but starts loosing generalization properties for certain set of test inputs. Moreover, the increase in the support vectors indicate high variance and overfitting of the response variable. For all other S-parameters, the value of C and σ indicate low variance and hence there is excellent agreement between the measured and modelled values as can be seen in the plots, Figs. 9 and 10, with very low MRE.
Furthermore, an excellent agreement is also obtained between the measured data and proposed model over broad frequency range of 1-18 GHz including extrapolation for different regions of the transistor as depicted in Fig. 11. Finally, the proposed technique is compared with the relevant state-of-the-art in the domain of small-signal modeling technique of GaN HEMT in Table 8. It is apparent that the proposed technique demonstrate superior accuracy and hence advances the state-of-the-art significantly. Overall, it can be inferred that the accuracy of GaN HEMT is mainly dependent on the selection of optimized hyperparameters.

A. STABILITY AND GAIN TEST OF POWER AMPLIFIER (Utility OF THE PROPOSED SMALL-SIGNAL Model)
Finally, the utility of the proposed small-signal model of GaN HEMT has been demonstrated by integrating the proposed model into CAD environment for circuit simulation and analysis. As already discussed, the main application of GaN based HEMT lies in the designing of RF Power Amplifiers (PAs). In this section, stability and gain test of amplifier is performed in CAD environment utilizing the proposed model. The complete circuit for stability and gain test of amplifier is shown in Fig. 12. The proposed model  has been developed in MATLAB. In this context, the section initially discusses the establishment of interface between MATLAB and CAD environment using MATLAB Cosimulation feature in Advanced Design System (ADS). MAT-LAB Cosimulation in ADS Ptolemy provides the MATLAB block that can actually pick the stored script or a function in MATLAB. In our work, a function was developed such that the trained model takes inputs and subsequently produces the response values. The input to the model are explicitly specified using float matrix block and passed into MATLAB function block and subsequently the responses are recorded using sink. The interfacing circuit between ADS and MAT-LAB is shown in Fig. 13. As the interface is established between MATLAB and ADS, the stability test for the class-F power amplifier is performed and subsequently the amplifier gain is determined by incorporating the proposed PSO-SVR based small signal model. Determination of stability is one of the most important aspect for amplification. For a transistor amplifier to be unconditionally stable for all passive and source and load impedances, if | in | < 1 and | out | < 1 where in and out are reflection coefficient seen from the source and load side respectively and depends on source and load matching networks. In this context, one of the distinct K-test is performed to test the stability of the network [47]. Where, K is known as Rollet's factor [47]. The amplifier is said to be unconditionally stable if the condition K > 1 and < 1 holds true for entire operating frequency range. To perform the stability and gain test for amplifiers, it is vital to determine the quiescent bias point initially at which the transistor is operating. The bias points are determined from DC I-V test. The quiescent bias points and the corresponding drain current set (V GSQ , V DSQ , I DS (mA)) obtained for 2 × 200 µm and 4 × 100 µm GaN HEMT are (-5.00517, 8.00955, 24.17), (-5.01596, 8.00955, 47.63) respectively. The simulation setup in 12 contains parallel stabilization network into bias circuitry with bias set equivalent to optimal bias values and proposed GaN HEMT model. Parallel stabilization network is a kind of high pass filter which is employed to attain the stability for lower frequency components and unattenuated high frequency component. Stability factors K and µ are studied (µ is recently proposed stability criterion well suited in simulators and it declares system to be unconditionally stable over a frequency range without any exception if µ > 0). The proposed GaN HEMT is provided with a test input as V GS = 0V, V DS = 6V, N=2, w g =200 µm for a complete frequency range. The results related to the stability of the amplifier is shown in Fig. 14 (a). It can be seen from the Fig.14 that K > 1 and µ > 0 for complete frequency range which specifies that the amplifier is unconditionally stable. Apart from this, it is evident that the value of µ source and µ load are also greater than 1 for entire frequency range which implies that the source and load side are matched. As far as gain is concerned, the maximum available gain and small-signal voltage gain defined in [47] is obtained for the amplifier as shown in Fig. 14 (b). It can be seen from the Fig.14 (b) that the maximum stable gain or small-signal voltage gain improves to a good extent with the increase in frequency. It can be attributed to the fact that a parallel RC stabilization network is used in this simulation framework which improves attenuation at high frequencies. In practice, a combination of series RC and parallel RC stabilization network is used to achieve optimum performance.

V. CONCLUSION
In this article, a novel PSO-SVR based technique is presented to model the small-signal behavior of GaN HEMT device. The proposed modeling technique is accurate, fast, and less complex. The key contributions include the development of a kernel based modeling approach for modeling the smallsignal behavior of GaN HEMT device. It picks the optimized set of kernel function to model the response variables of the model from defined kernel function library. Then the incorporation of PSO algorithm aids in the identification of the optimal sets of kernel function and associated hyperparameters for accurate modeling of GaN HEMT. The robustness of the proposed model is evaluated by testing it under various noise conditions, frequency extrapolation, and geometric interpolation. It has been shown that the proposed model is robust under various noise conditions, shows a very good frequency extrapolation ability, and excellent geometric interpolation capability. It has been also demonstrate that the proposed SVR based model exhibits far superior accuracy when compared to the conventional ANN based model. Finally, the effectiveness of the proposed model has been shown by integrating the proposed model into CAD environment and performing stability and gain test for a power amplifier.
[46] A. Khusro  Electronics GmbH, Germany, and the Philips Technology Center, Germany. He is currently an Associate Professor with Nazarbayev University, Kazakhstan, and also holds a faculty position at IIIT Delhi, India. His current research interests include advanced RF circuits, broadband linear and efficient power amplifiers for mobile and satellite applications, high-and low-frequency instrumentation, and wireless power transfer techniques. His research activities have led to one book, three U.S. patents (two pending), and over 185 journal and conference publications. He is an Associate Editor of IEEE Microwave Magazine.
ABDUL QUAIYUM ANSARI (Senior Member, IEEE) received the B.Sc.Engg. degree from Aligarh Muslim University, Aligarh, India, the M.Tech. degree from IIT Delhi, Delhi, India, and the Ph.D. degree from Jamia Millia Islamia, New Delhi, India. He held academic positions at University Polytechnic Jamia Millia Islamia, the College of Computer Science, King Khalid University Kingdom, Saudi Arabia, the Department of Computer Science, Jamia Hamdard (Hamdard University), Delhi, and the Department of Electrical Engineering, Jamia Millia Islamia. He has produced excellent results with proven records through two patent applications, 58 peer-reviewed international journal articles, one book each written and edited, 12 book chapters, and 90 peerreviewed international conference papers. His research interests include HEMTs and microwave devices, soft computing, computer networks and data communications, embedded systems, and NoC. He is a fellow and a Chartered Engineer of the Institution of Engineering India (IEI) and IETE.
SULTANGALI ARZYKULOV (Member, IEEE) received the B.Sc. degree (Hons.) in radio engineering, electronics, and telecommunications from Kazakh National Research Technical University after K. I. Satpayev, Almaty, Kazakhstan, in June 2010, the M.Sc. degree in communication engineering from The University of Manchester, Manchester, U.K., in 2013, and the Ph.D. degree in science, engineering, and technology from Nazarbayev University, Nur-Sultan, Kazakhstan, in 2019. He is currently a Postdoctoral Scholar with Nazarbayev University, Kazakhstan. His research interests include wireless communication systems, with particular focus on cooperative communications, cognitive radio, energy harvesting, interference mitigation, and NOMA. He acts as a Reviewer for several journals/conferences and served as a Technical Program Committee Member of numerous IEEE Communication Society flagship conferences. VOLUME 8, 2020