Exploiting Support Vector Machine Algorithm to Break the Secret Key

Template attacks (TA) and support vector machine (SVM) are two effective methods in side channel attacks (SCAs). Almost all studies on SVM in SCAs assume the required power traces are sufficient, which also implies the number of profiling traces belonging to each class is equivalent. Indeed, in the real attack scenario, there may not be enough power traces due to various restrictions. More specifically, the Hamming Weight of the S-Box output results in 9 binomial distributed classes, which significantly reduces the performance of SVM compared with the uniformly distributed classes. In this paper, the impact of the distribution of profiling traces on the performance of SVM is first explored in detail. And also, we conduct Synthetic Minority Oversampling TEchnique (SMOTE) to solve the problem caused by the binomial distributed classes. By using SMOTE, the success rate of SVM is improved in the testing phase, and SVM requires fewer power traces to recover the key. Besides, TA is selected as a comparison. In contrast to what is perceived as common knowledge in unrestricted scenarios, our results indicate that SVM with proper parameters can significantly


Introduction
Kocher et al. [1] first brought forward power analysis (PA) attacks in 1999.Since then, a variety of PA attacks have emerged, such as differential power analysis (DPA) [1], template attacks (TA) [2], correlation power analysis (CPA) [3], stochastic model based power analysis (SMPA) [4] and so on.The cryptographic device must maintain the secret key regardless of whether the algorithm itself is public or not.Thus, a crucial requirement is that the key-related information of a cryptographic algorithm must not be disclosed during execution.So far, none of the cryptographic devices has been able to prevent this relevant information from being leaked through various side channels.The book [5] comprehensively summarizes PA attacks and countermeasures.
As early as the nineties of last century, Rivest [6] had recognized the similarities between machine learning (ML) and cryptography.In recent years, a large number of ML algorithms have been applied to PA attacks, e.g.multilayer perceptron [7], [8], k-means clustering [9], k-nearest neighbors [10], support vector machine (SVM) [11][12][13][14][15][16], etc. Hospodar et al. [11] first applied SVM to PA attacks.Although no real attack has been performed, it provides a novel perspective on how SVM is used in PA attacks.The first extension to 9 Hamming Weight (HW) classes for SVM was given in [13].Lerman et al. [15] suggested the attack based on ML against a masked AES implementation.The authors studied ML algorithms mostly using 9 or up to 16 classes.We successfully recovered the secret key by using SVM in [16].These related contributions suggest that some ML algorithms are effective in PA attacks.Furthermore, the performance of SVM is slightly superior to other ML algorithms.However, almost all studies on SVM in PA attacks [12][13][14][15][16] assume power traces are sufficient to reveal key-related leakage information and the number of profiling traces belonging to each class is equal.Indeed, the number of profiling traces may be different in the real attack scenario.
On the one hand, one can consider the S-Box output value itself as a sensitive variable, resulting in 256 uniformly distributed classes.On the other hand, the attack target can be also the HW of an 8-bit intermediate value, resulting in 9 binomial distributed HW classes.Even more, considering 256 classes yields direct information about the secret key because each class is only relevant to one guessing key.However, each class is associated with multiple guessing keys when using 9 HW classes.For instance, the HW class 4 needs to handle the largest number of guessing keys, where there are 70 possible values.However, the number of possible value is 1 when the HW value is 0 or 8.This is called data imbalance in the ML community.
The authors [17] have confirmed that the separating hyperplane of SVM trained with data imbalanced will skew towards the minority class, and this skewness reduces the performance of SVM.It seems to be more advisable to use the S-Box output directly as the label value of an SVM classifier.
However, with the increase in the number of classes, the computational complexity of SVM also rises.This complexity of multi-class SVM rises with O(|Θ| 2 ) when the oneagainst-one strategy [18] is used, where |Θ| is the number of classes we need to classify.This will make the SVM algorithm inefficient in the parameter tuning phase because of the high computational burden.In light of this, the Hamming Weight model, which assumes the intermediate power consumption value of the S-Box output, is selected as the hypothetical power leakage model in our paper.From our point of view, the importance of the distribution of power traces in PA attacks has not yet been investigated.Therefore, the purpose of this article is to explore the impact of the distribution of profiling traces belonging to each class on the performance of SVM.
In this paper, we first calculated the label of each power trace and then predicted the probability that all instances belong to each class.Finally, the correct key was obtained by the maximum likelihood estimation.All experiments were performed on the publicly available power traces.We used Synthetic Minority Oversampling TEchnique (SMOTE) [19] instead of Different Error Costs (DEC) [20] to compensate for the distribution of HW classes.By using SMOTE, we modified 9 binomial distributed HW classes to 7 uniformly distributed HW classes, which could get more appropriate SVM parameters in the parameter tuning phase.Our results demonstrated that the success rate of 7 uniformly distributed HW classes was higher than that of 9 binomial ones for SVM in the testing phase.Moreover, SVM-RBF only required about 4 power traces to recover the secret key when classifying 7 classes.The remainder of this article is organized as follows: Section 2 introduces the basic knowledge of profiling attacks and SVM.Section 3 gives our methodologies used in this article.Our experiments and results are presented in Sec. 4. We conclude this article in Sec. 5.

Background
In this section, we briefly introduce the basic information of previous profiling attacks and the SVM algorithm used in this paper.

Profiling Attacks
Profiling and non-profiling attacks are two main types of PA attacks.Profiling attacks assume that an attacker has an identical cryptographic device that is almost completely controlled by him.For this device, he is free to set the key and plaintext and then calculate the intermediate value.Thus, an attacker can guess the secret key according to an appropriate power leakage model.Profiling attacks contain two phases: a profiling (learning) phase and key recovery (attacking) phase.In the profiling phase, the key-related leakage information caused by intermediate values being processed can be characterized by profiling traces.The attacker uses these profiles (features, templates) to predict the correct secret key in the attacking phase.
TA is a typical profiling attack based on multivariate Gaussian distribution N (t; (m, C)), as described below.
where t represents a N-dimensional vector, m is the mean vector, C is the covariance matrix, which is called templates.
For TA, the attacker builds different templates for different classes, which corresponds to different intermediate values in the learning phase.In the attacking phase, the attacker uses the maximum likelihood estimation as a distinguisher.The log likelihood of each possible key k is as follows [2]: where M k is the number of power traces belonging to the secret key k.

Understanding the SVM
Cortes and Vapnik [21] proposed the SVM algorithm to address the linear binary classification with high generalization.
where X i is a training vector, and y i is the label of X i .The training vector X is mapped into feature space by the nonlinear function φ(•).Consequently, the maximum margin of a binary-class SVM classifier is a constrained optimization problem as follows: where ω ∈ R N , b ∈ R, and C > 0 is the penalty parameter which evaluates the trade-off between training error and margin size, and ξ i is the training error of X i .After the Lagrange multiplier is introduced, the optimization problem in (3) is simplified as follows: where α i are Lagrange multipliers, and the kernel function is The kernel function maintains the reasonable computational complexity of SVM in feature space.The common kernel functions are linear kernel (K Linear ) and RBF kernel (K RBF ).
For consideration of training time and accuracy (ACC, the ratio of true positives and true negatives to the total number of all instances), the one-against-one strategy [18] can be used to train an SVM classifier for each pair of possible classes.In order to use the maximum likelihood estimation to recover the secret key, an attacker is more interested in the probability of an instance X i belonging to the class c.Accordingly, we give the posterior conditional probability P SVM (X i |c) of each instance [23].

Methodologies
In order to ensure the reproducibility of our results, we used a publicly available dataset.The DPA Contest v4 (DPACv4) [24] provides 100,000 power traces of the masked AES software implementation.Since the mask value is known in [16], we can directly convert this dataset to an unprotected scenario.We selected 4000 (DS0) and 8000 (DS1) random power traces to make a fair comparison of all experiments.And also, we only explored how to recover the secret key more efficiently and ignored the mask recovery phase.
Our experimental methodology was as follows: Given a dataset, a random two-thirds was used as the learning set and the remaining one-third was reserved as the testing set.The learning set was divided into training and validation sets by using 10-fold cross validation.The validation sets of all folds were used in the parameter tunning phase.The best parameters (the one with the highest average accuracy on all validation folds) were used for training the final SVM model in the testing phase.Furthermore, the correct key was obtained by the maximum likelihood estimation in the key recovery phase.

Feature Selection
Our dataset is focused on all bytes (0 to 15) of the first round key of AES.Although it is a software implementation, the most leaking operation is not register reading or writing, but the S-Box operation of the first round of AES.As shown in Fig. 2, the HW model is used to characterize the hypothetical power consumption of the S-Box output.The HW value of the S-Box output, i.e., HW (Sbox [t i ⊕ k i ]) , i = 0, 1, . . ., 15, is selected as the label of an SVM classifier.Here t i represents the ith byte of a random plaintext, k i denotes the ith byte of the fixed secret key, and Sbox [•] is a substitution operation.Consequently, the label value of an SVM classifier corresponds to the HW value from 0 to 8. In this case, the number of power traces belonging to each HW class obeys the binomial distribution.
According to article [16], the interesting points were extracted from 16 S-Boxes.We calculated Pearson correlation coefficients between each sample instant of power traces and the HW of the S-Box output to locate interesting points.Moreover, the 32 highest correlated sample instants were selected as interesting points.As we can see from Fig. 3, for the eighth S-Box, most of the sample instants have no prominent power leakage.We omit the details about the remaining S-Boxes due to the lack of space.
Here we only used the Pearson correlation method for feature selection.In addition, many signal preprocessing techniques can also be used to choose interesting points in PA attacks, e.g.minimum redundancy maximum relevance (mRMR) [25] and principal component analysis (PCA) [26], etc.

SMOTE
The strategy of SVM to solve the problem of data imbalance is divided into algorithm level and data level methods [27], [28].Algorithm level methods focus on modifying existing algorithms to mitigate their bias towards the majority class.The Different Error Costs (DEC) method is a typical representative of this category proposed in [17] to overcome the same cost C for both minority and majority misclassification.As given in (7) below: where C + is the misclassification cost for the minority class, while C − is the misclassification cost for the majority class.
The DEC method improves SVM by allocating the minority class instances with a higher misclassification cost (i.e., C + > C − ).The improved SVM algorithm would not tend to skew the separating hyperplane towards the minority class instances, which reduces the total misclassifications.Here we simply set the C + /C − equal to the ratio of the minority examples to the majority examples [20].
At the data level, the implemented state-of-the-art methods can be categorized into over-sampling, under-sampling, the combination of under and over-sampling, and ensemble learning methods [29].Compared with other sampling techniques, SMOTE is the most powerful technique that has been a great success in many applications [19].SMOTE creates synthetic data based on similarities among existing minority examples in feature space.The minority class is over-sampled by taking each minority class sample and introducing synthetic examples along the line segments mixing any of the k minority class nearest neighbors.Synthetic examples are created in the following way: Calculate the difference between the selected feature vector and its nearest neighbors.Multiply this difference by a random number between 0 and 1, and add it to the selected feature vector.This causes the selection of a random point on the line between two particular features.
The HW of an 8-bit S-Box output has resulted in 9 binomial distributed classes.Naturally, this distribution does not provide an equal number of power traces for each HW class.Moreover, in our datasets, the number of power traces belonging to the HW class 0 and 8 accounts for about 0.8% (2× 1 256 ).The synthetic power traces cannot represent the true distribution of the HW class 0 or 8. Hence, the HW class 0 and 8 in our datasets are discarded directly, and then we set different nearest neighbors for the remaining HW classes (1 to 7).We can get 7 uniformly distributed HW classes by using SMOTE.As a comparison, we will report the experimental results of classifying 7, 9, and 256 classes in the next section.

Experiments and Results
LIBSVM (Library for Support Vector Machine) [30] was used as the framework for conducting our attacks.All experiments were performed on Asus laptop with 2.50 GHz Intel Core (TM) i5-7200U, 16 GB 2133 MHz DDR4 (Win-dows10 x64).The attack lasted about 12 weeks without considering the time to create two datasets.

Parameter Tuning Phase
There is no an effective learning method to cover all attack scenarios in the parameter tuning phase.According to paper [31], we selected the penalty parameter C from 0.01 to 256 with a step of 2, epsilon (tolerance of termination criterion) from 0.01 to 0.25 with a step of 0.05, the hyperparameter γ in (6) from 0.001 to 32 with a step of 2.Here we gave the parameter range but omitted tuning details.An open-source python toolbox, namely imbalanced-learn [32], was used to generate synthetic power traces.In Tabs. 1 and 2, the success rates of SVM-RBF for all S-Boxes are given in ACC C/γ form.All values of ACC are given in percentages, and we provide the parameter combinations penalty parameter C and hyperparameter γ) reaching those values.The success rate of 7 uniformly distributed classes is significantly higher than that of 9 binomial ones for SVM-RBF.This proves that the distribution of profiling traces affects the performance of SVM-RBF.When the dataset size is expanded from DS0 to DS1, the success rate of 9 binomial distributed classes is improved by less than 1.5%.Excitingly, the success rate of 7 HW classes using DS0 is basically equivalent to that of 9 binomial ones using DS1.In other words, our method improves the performance of SVM-RBF without increasing the number of profiling traces, which is an interesting aspect of PA attacks.
For SVM-RBF, the success rate of 256 classes is significantly lower than that of 7 and 9 classes.This can be explained by the fact that the number of profiling traces is not enough to train good parameters to classify 256 classes.However, when considering random classification, there is 1/9 chance of a successful guess for 9 classes, while there is 1/256 chance for a random hit in the 256 classes scenario.Obviously, the success rate of 256 classes is higher than a random guess.The results may even be further improved through a more exhaustive parameter tuning phase, which requires more profiling traces and longer tuning time.Nevertheless, the complexity of parameter tuning makes it difficult to give some theoretical explanations for the performance of SVM.Furthermore, the high complexity of the attack method makes the investigated algorithm unattractive for some security evaluation scenarios.
We also used the SVM-RBF with DEC to solve the problem caused by data imbalance.Figure 4 gives the success rate of SVM-RBF with DEC when using DS1 to classify 9 binomial distributed HW classes.Compared to using the same cost, the performance of SVM-RBF with DEC has hardly been improved, and even worse for some S-Boxes.The reason may be that the strategy we described in Sec.3.2 for setting penalty parameters is inappropriate.However, the penalty parameter requires to be calculated iteratively, which is difficult to set in the real problem.Hence, in all subsequent experiments, we did not report the results of SVM using the DEC method.

Testing Results
In this section, we only reported the results of SVM-Linear, SVM-RBF with the best parameter combinations, and TA when using DS0 and DS1.Our experiments were executed on the independent testing set to verify the performances of SVM and TA for classifying 7, 9, and 256 classes.Note that for SVM and TA, we used the same datasets and the same interesting points.In order to make our experimental results accurate, each experiment was repeated ten times and then their average score was regarded as the final result.
The testing results are given in a form of ACC/AveP/Fmeasure for SVM-Linear and SVM-RBF while for TA we only give the success rate.Here, F-measure (F1-score) is  the harmonic mean of the precision and recall, where precision is the ratio of true positives to predicted positives, while recall is the ratio of true positives to actual positives [33].The receiver operating characteristic curve is usually used to present the results of binary classification problems with the uniformly distributed classes.However, when dealing with highly skewed datasets, the precision-recall curve provides more information about the performance of a learning algorithm [34].The average precision (AveP) is defined as the area under the precision-recall curve.In Tabs.3, 4, and 5, all values of ACC are given in percentages, while AveP and F-measure are in the range [0, 1].The higher the value, the better the result.
As expected, the success rate of SVM-RBF in the parameter tuning phase is higher than the results in the testing phase because of the generalization of SVM.For 256 classes, success rates of SVM-RBF are generally reduced by 3% to 10%, in the worst case, the success rate drops from 39.8% to 30.07%(see S-Box3, DS0, Tab. 3).We can see that for 16 S-Boxes, success rates of SVM are obviously different because of the difference of their power consumption leakage information.When the dataset size is expanded from DS0 to DS1, the success rate of SVM increases by more than 10% for most S-Boxes.That is, the larger dataset size, the higher the success rate.The reason is that the performance of SVM is determined by its parameters, and the dataset size is critical to parameter optimization.Moreover, the success rate of SVM-RBF is 1% ∼ 6% higher than that of SVM-Linear when using DS1 in Tab. 3.
From Tabs. 4 and 5, we can see that the testing results of 7 uniformly distributed classes are much better than those of 9 binomial ones.In particular, success rates of 7 uniformly distributed classes are at least 3% higher than those of 9 binomial ones for S-Box0, 2, 5, 8, 11, and 14.For SVM and TA, the success rate of 7 uniformly distributed classes using DS0 is higher than that of 9 binomial ones using DS1, which is a surprising result.Generally, with the increase of the number of profiling traces, the performance of SVM is improved.In this case, SMOTE is used to compensate for the distribution of existing learning set.However, the success rate of SVM is higher than that of using more profiling traces in the training phase owing to the use of synthetic power traces for the minority classes.This indicates the performance of SVM is dependent on the distribution of the number of profiling traces belonging to each class.Additionally, kernel functions play an important role in improving the performance of SVM.SVM-RBF has a higher success rate than SVM-Linear when classifying 7 and 9 classes.Even more, the success rate of SVM-RBF with using DS0 is higher than that of SVM-Linear with using DS1.The hyperparameter γ in (6) provides greater flexibility for SVM-RBF.Inevitably, SVM-RBF also takes more time to find the optimal separating hyperplane in the parameter tuning phase.
Although the success rate gives the impression that SMOTE improves the performance of SVM, AveP and F-measure can analyze the testing results from a novel point of view.The AveP of 9 HW classes is scattered between 0.68 and 0.95, while that of 7 classes is between 0.76 and 0.99.By looking at the confusion matrix (matrix where each row represents the instances in an actual class while each column represents the instances in one predicted class), we find that an SVM classifier handles all instances as the class with more profiling traces when distinguishing between the HW class 0 (or 8) and another class.Naturally, classifying all instances into a single class will not be a successful attack, because this doesn't reveal any information about the secret key.The AveP values of 256 uniformly distributed classes are between 0.09 and 0.57 in Tab. 3, which are significantly lower than those of 7 and 9 classes in Tabs. 4 and 5.This is because the number of profiling traces is not sufficient to train highprecision classifiers when classifying 256 classes.In general, the higher the ACC value, the higher the F-measure value.Besides, F-measure is slightly lower than ACC.Thus, we do not discuss F-measure in detail due to the lack of space.
We also used the standard TA approach to compare the success rates available in the same attack scenario.TA is considered to be the most powerful attack technique from an information theoretic point of view, which assumes that sample points of each trace follow the multivariate Gaussian distribution.As shown in Tabs.3, 4, and 5, the success rate of TA is obviously lower than that of SVM when using DS0 and DS1 to classify 7, 9, and 256 classes.Compared with SVM, the numerical instability of TA is highlighted when the profiling traces are not enough to reveal the key-related leakage information.In addition, with the increase of the number of profiling traces, the success rate of TA has not increased significantly.However, the success rate of 7 uniformly distributed classes for TA is higher than that of 9 binomial ones.Briefly, our testing results demonstrate that the using SMOTE to compensate for the distribution of HW classes is also effective for TA.Furthermore, SVM (especially SVM-RBF) can significantly outperform the classical TA when properly used.
In order to make our experiments more convincing, our attacks were executed hundreds of times, and then we gave the statistical results of the success rate.Figures 5  and 6 present some box plots that summarize the success rates of SVM and TA when using DS1 to classify 7 and 9 classes.In each box plot, the central bar corresponds to the median (the second quartile), and the green triangle represents the mean.The bottom and top of the box are always the first and third quartiles, and the whisker is the maximum/minimum value excluding outliers.The value is considered as an outlier when it is greater than 3  2 times of upper/lower quartile.Consistent with our hypothesis, SMOTE can effectively solve the problems caused by data imbalance, which improves the performance of SVM and TA.Besides, the success rate of SVM (especially SVM-RBF) is higher and more concentrated than TA.

Key Recovery Phase
The maximum likelihood estimation assumes that multiple power traces can be used to recover the secret key, thus the success rate is not suited as a measure.The guessing entropy [35] could be used to evaluate the number of remaining keys.The guessing entropy is defined as follows: let g include the descending probability ranking of all possible keys and i represent the position of the correct key in g.After performing s experiments, one gets a matrix [g 1 , g 2 , . . ., g s ] and a corresponding vector [i 1 , i 2 , . . ., i s ].Namely, the guessing entropy represents the average amount of traces required Tab. 6.The number of power traces required by using SVM and TA when the guessing entropy is set to one (GE 1 ).
to recover the correct key.Hence, the guessing entropy is selected as a metric in the key recovery phase.
In this section, SVM and TA were used to recover the secret key when classifying 7, 9, and 256 classes.Instead of predicting the class c of each trace, we gave the posterior conditional probability P SVM (X i |c).The key that maximizes the log likelihood probability in (2) is selected as the correct key.As described in Sec.3.1, for all possible subkeys k i * (0x00 to 0xff) and the ith byte of a random plaintext t i , the value of HW(Sbox [t i ⊕ k i * ]) might be 0 or 8. Since the plaintext is subject to uniform distribution, the probability of the HW class 0 and 8 is about 0.8% in our datasets.Note that for 7 HW classes, we only gave the probability of each instance belonging to the HW classes 1 to 7 in the testing phase.To solve the problem, the probability belonging to the HW class 0 or 8 was considered to be one-tenth of the minimum value of these probabilities in terms of efficiency and effectiveness.Figures 7, 8 and 9 report the guessing entropy of SVM-Linear, SVM-RBF, and TA as a function of the number of power traces used for respectively using DS1 to classify 7, 9, and 256 classes.The gray curves are used to describe the guessing entropy of 16 S-Boxes and their average is selected as the target results.In Tab. 6, we give the number of traces required by SVM and TA when the guessing entropy is set to 1 (GE 1 ).SVM-RBF requires a minimum number of power traces to recover the key when using DS1 to classify 7 classes, which only requires 3.97 traces in average.Due to the difference in power consumption between 16 S-Boxes, the number of traces required to recover the secret key varies greatly.Even more, for some S-Boxes, SVM-RBF only needs one trace to recover the key when classifying 7 or 256 classes.However, SVM and TA require more than one trace when classifying 9 binomial distributed HW classes.Figure 10 illustrates the overall time required to perform an attack when using DS1 to classify 7, 9, and 256 classes.As the number of classes that we need to distinguish is reduced, the overall time of SVM and TA is greatly decreased.Compared with TA, SVM has a much lower computational burden.We can see that TA spends the most time and SVM-Linear needs the least time when classifying 7, 9, and 256 classes.In addition, the overall time required to classify 7 HW classes is much lower than that of 9 HW classes when using SVM.As described in the previous section, SVM (especially SVM-RBF) has a higher success rate and requires less number of power traces to recover the key when classifying 7 HW classes.In general, there is a trade-off to be made between accuracy and efficiency.In fact, this article is no exception.Since SMOTE is used to compensate for the distribution of HW classes, we need more time to obtain 7 uniformly distributed classes.However, SMOTE is executed before the parameter tuning phase, which can be independent of our attack phase.Therefore, the time required by SMOTE is not included in Fig. 10.From what has been discussed above, we firmly believe that SMOTE can solve the problem caused by data imbalance efficiently.

Conclusions
As described in the above section, PA attacks are considered to be the classification problem in the ML community.SVM and TA create features (templates) to characterize power traces of the training set and then calculate the similarity between these features and new traces of the testing set.Ultimately, the results are provided with a certain probability.Generally, TA assumes that sample points of power traces are approximated by a set of finite normal distributions.However, ML algorithms assume that sample points are subject to independent and identically distributed, but not limited to a certain distribution.Consequently, SVM can extract more information about the secret key than TA by analyzing the same interesting points.This paper discusses the effect of the distribution of profiling traces belonging to each class on the performance of SVM in PA attacks.The SVM algorithm optimizes the overall accuracy without considering the distribution of profiling traces, which tends to perform poorly on highly skewed datasets.SMOTE is used to compensate for this deficiency when classifying the binomial distributed HW classes.The results demonstrate the success rate of 7 uniformly distributed classes is higher than that of 9 binomial ones for SVM and TA in the testing phase.Additionally, the performance of SVM with proper parameters is superior to that of TA.SVM-RBF requires an average of less than 4 power traces to recover the secret key when using DS1 to classify 7 classes.Further analysis indicates that SMOTE significantly improves the performance of SVM in terms of attack effectiveness and efficiency.

Figure 1
Figure 1 illustrates the framework of our experimental procedures and concepts involved in this article.SMOTE is used to compensate for the binomial distributed HW classes after the execution of feature selection.The parameter tuning phase finds the best parameters for the training and testing phases.Each experiment was repeated ten times in the loop block.The testing results are given in a form of ACC/AveP/Fmeasure.The guessing entropy is used to evaluate the number of remaining keys.

Fig. 1 .
Fig. 1.Block diagram of the framework of our experimental procedures and concepts involved in this article.

Fig. 4 .
Fig. 4. Comparison of the success rate of SVM-RBF when using DS1 to classify 9 HW classes.

Tab. 5 .
Testing results (ACC/AveP/F-measure) of SVM and TA for 7 classes using power traces of DS0 and DS1.

Fig. 5 .
Fig. 5. Box plot of SVM and TA for 9 classes by using DS1.

Fig. 6 .
Fig. 6.Box plot of SVM and TA for 7 classes by using DS1.
Testing results (ACC/AveP/F-measure) of SVM and TA for 9 classes using power traces of DS0 and DS1.