Trade-off between Bagging and Boosting for quantum separability-entanglement classification

Certifying whether an arbitrary quantum system is entangled or not, is, in general, an NP-hard problem. Though various necessary and sufficient conditions have already been explored in this regard for lower dimensional systems, it is hard to extend them to higher dimensions. Recently, an ensemble bagging and convex hull approximation (CHA) approach (together, BCHA) was proposed and it strongly suggests employing a machine learning technique for the separability-entanglement classification problem. However, BCHA does only incorporate the balanced dataset for classification tasks which results in lower average accuracy. In order to solve the data imbalance problem in the present literature, an exploration of the Boosting technique has been carried out, and a trade-off between the Boosting and Bagging-based ensemble classifier is explored for quantum separability problems. For the two-qubit and two-qutrit quantum systems, the pros and cons of the proposed random under-sampling boost CHA (RUSBCHA) for the quantum separability problem are compared with the state-of-the-art CHA and BCHA approaches. As the data is highly unbalanced, performance measures such as overall accuracy, average accuracy, F-measure, and G-mean are evaluated for a fair comparison. The outcomes suggest that RUSBCHA is an alternative to the BCHA approach. Also, for several cases, performance improvements are observed for RUSBCHA since the data is imbalanced.


I. INTRODUCTION
Nowadays, machine learning (ML) is being employed more to tackle and solve harder problems in quantum information science.In recent years, it has been applied in state classifications [1][2][3], state reconstruction [4], parameter estimation [5], and many others [6][7][8][9][10][11][12][13].The motivation behind using ML in quantum information is to get more insights into problems where usual numerical techniques either fail or need more resources, eg., the optimization tasks in high constraint or nonconvex scenarios.
To decide whether an arbitrary quantum state is entangled or not is an NP-hard problem [14].It is one of the longstanding fundamental issues in entanglement theory.A state of a composite system ρ AB is said to be separable if ρ AB = i p i ρ i A ⊗ ρ i B for any two subsystems A and B, where p i (≥ 0) represents classical mixing probability with i p i = 1.Otherwise, it is an entangled state.There exist numerous criteria to detect bipartite entanglement, however, these criteria are less reliable for higher dimensional systems.For example, the popular Peres-Horodecki criteria state that the separable states are positive partial transpose (PPT) [15,16], meaning for separable states ρ T A AB ≥ 0, where T A denotes transposition on system A. The criteria are necessary and sufficient for d A d B ≤ 6, where d denotes system dimension.Other extant method includes entanglement witness, reduction criteria, cross-norm, or realignment criteria to name a few [17].The most powerful technique is k-extension hierarchy, but it is notoriously hard to compute due to its exponentially growing complexity with k [18,19].Recently, in Ref. [1], it was studied that ML techniques are instrumental in probing separability-entanglement classification.It was established that the ML-based technique is more efficient in terms of speed and accuracy than all extant methods.A couple more ML-based techniques were well studied for quantum separable-entanglement classification using artificial neural networks [20,21].
Ref. [1] employed the convex hull approximation (CHA) to probe the separability-entanglement boundary using a supervised learning scheme.To reduce the error in classification using CHA, the bagging method [22] was invoked.This new method is known as bagging CHA (BCHA).This method increases the speed and accuracy of data manipulation as it divides the whole process into smaller units, and then runs in parallel.Ref. [1] demonstrates their results for two-qubits and two-qutrit systems with fairly high accuracy.
In this work, building on the approaches of Ref. [1], we propose an alternative method that addresses some important issues with further accuracy improvements for the separabilityentanglement classification using ML.First, a) we notice that the earlier work doesn't address the issue of handling data imbalance, and b) did not explore all extant performance measures in their study.

II. SETTING UP THE STAGE A. Supervised learning
Supervised learning is a method of developing artificial intelligence that involves training a computer algorithm on input data that has been labeled for a certain output [23].In order to apply it to real-time data, the model is trained until it can discover the underlying patterns and relationships between the input data and the output labels, allowing it to produce accurate classification results.
For supervised learning, the system is supplied with labeled data sets throughout its training phase, which tell it what output is associated with each specific input set.The trained model is then evaluated with test data, which is labeled data with the labels hidden from the algorithm [24].Further, the unlabeled testing data is used to determine how well the algorithm performs the classification task [25].
To create the learning dataset, we consider bipartite quantum state ρ AB of dimension The training dataset is then defined as where x i is the i th sample and y i is its corresponding class label, which is represented as, y i = 1(0) if it is separable (entangled).Data labeling for d A d B ≤ 6 is performed by using PPT criteria.However, for higher dimensions, the labeling is done as per the Appendix-C of Ref. [1].
In supervised learning, the main aim is to find a classifier (indicator function) Θ : V → {0, 1} which will fit the training data at best among a class of functions F. As the present quantum entanglement is a binary classification problem, the error expresses the miss classification rate over two classes.For any training data Ω train consisting of n samples, each associated with feature vector V and a target class label y i (∈ {0, 1}); the loss function L for any binary classifier Θ can be represented as where 1[•] is a truth function of its argument.For any test data Ω test , the value of function L(Θ, Ω test ) depicts the generalization error from Ω train to Ω test .
It was found that among numerous extant supervised learning algorithms, eg., support vector machine (SVM) [26], decision tree [27], boosting [28], etc do not provide acceptable accuracy for separability problem [1].This is due to the complex structure of the set of separable states.This led authors of Ref. [1] to the following consideration.

B. Combining CHA with supervised learning
The set of all separable states, Ω 1 , is convex and compact, and its exterior points are all pure product states.Using this fact, one can sample Ω 1 using convex hull (C) of m number of product states, {c i } ∈ V, i.e., C := conv{c i |i = 1, . . ., m}.The C is the CHA of Ω 1 , and one can decide if an unknown state ρ is separable or not by examining whether its feature vector x is in C. Equivalently, it is the solution of following linear programming: where α has functional dependence on both C and x.If x is in C, then the corresponding state, ρ, is separable, else ρ is an entangled state with high possibility.More specifically ρ is separable when α ≥ 1 and entangled otherwise.We denote a maximal α for a chosen m-value as α m max .If we increase m (to better approximate C), we will achieve better classification.It is evident that adding more exterior points in convex approximation will increase the accuracy of the above algorithms, however, it is really time-consuming.To overcome this, Ref. [1] used CHA in combination with supervised learning.Now, training data is defined as Ω train = {(x i , α i , y i )|i = 1, . . ., n} and the loss function of classifier Θ is redefined as Where α i is the outcome of CHA for i-th random density matrix after solving the linear programming for finding x in C. Note that, CHA uses a threshold α ≥ 1 to classify as 1(0).The values of α acts as another feature for the classifier to learn the model.In Ref [1] bagging-based classification is performed on this feature space, known as Bagging CHA (BCHA).More information on the Bagging and Boosting approaches is discussed further.

C. Overview of Bagging and Boosting Classifiers
An ensemble meta-estimator called a bagging classifier fits base classifiers one at a time to random subsets of the original dataset, and then it aggregates the individual predictions (either by voting or by averaging) to provide a final prediction.By adding randomization to the process of building a blackbox estimator (such as a decision tree), a meta-estimator of this kind can often be used to lower the variance of the estimator.
A training set is created by randomly selecting M instances (or pieces of data) from the original training dataset (of size N ), and used to train each base classifier in parallel.Each base classifier's training set is distinct from the others.In the resultant training set, many of the original data might be replicated while others might not.An overview of Bagging classifiers is presented in Fig. 1.
A number of weak classifiers are combined in the broad ensemble approach known as "boosting" to produce a strong classifier.In order to do this, a model is first constructed using the training data, and a second model is then developed in an effort to fix the errors in the first model.The training set is predicted exactly or a predetermined number of models are added, depending on which comes first.AdaBoost [29] was the first really successful boosting algorithm developed for binary classification.An overview of Boosting classifiers is presented in Fig. 2.
Both boosting and bagging fall under the category of "ensemble learning."Combining many weak learners to create a hybrid categorization system.Most often, "ensemble learning" refers to trained weak decision ensemble trees.

D. Imbalanced dataset
Imbalanced dataset refers to an unequal distribution of class samples within a dataset.Such unequal distribution of class samples reduces the training performance of the classifiers, and hence the classification results on the testing data are also affected.
In the present context, the volume of entangled states is far more than the separable states, making the dataset imbalanced.For more details on the experimented datasets, see Section IV A. From the discussion in Section IV A, we can observe that the prevalence differences are high for both datasets and hence they are highly imbalanced.
This demands a classifier that can handle data imbalance issues and can be more suitable for quantum separabilityentanglement classification problems.Which is discussed in the next section.
Also, for such imbalanced datasets, the learning performance of any ML approach is greatly affected [30] and needs a careful performance evaluation.Such performance measures are discussed in Section IV B.

E. Ensemble classifiers for imbalanced dataset
It has been well studied that, for imbalanced data, the SVM classifier may be biased towards the majority class [31].A modification of SVM has already been presented, incorporating random under-sampling (RUS) for an unbalanced dataset [32] by removing the samples randomly from the training set.For highly unbalanced data, synthetic minority oversampling technique (SMOTE) [33,34] has been applied towards classification, where, it generally over-sample the minority class to create synthetic data points.So further incorporation of SMOTE to Boosting approach may be effective for classification.When oversampling is performed by duplicating examples, it may lead to over-fitting [35].So, further modification by incorporating the under-sampling may help in the performance improvement of the classifier.Instead of over-sampling the minority classes, under-sampling the majority classes also may help in improving the classifier results.The RUS randomly removes examples from the majority class until the desired class distribution is found [36].Such integration with Boosting is RUSBoost [36], which is a hybrid approach combining random under-sampling, SMOTE, and Adaptive Boost (AdaBoost) classifier.
For ensemble learning, bagging and boosting are generally applied (see Fig. 1 and Fig. 2).Already the Bagging-based CHA (BCHA) is proposed [1], reporting higher accuracy than CHA.But, as the data is highly unbalanced, the accuracy evaluation should be twofold -1) Overall accuracy (OA) and 2) Average accuracy (AA).For more details on the performance measures OA and AA, see IV B. OA is the number of correctly classified test samples per total samples under test.While AA is the sum of accuracy for each class predicted per the total number of classes (average of each accuracy per class).Hence, although the reported OA [1] is higher, we evaluated the AA of BCHA, which is of less margin than the CHA approach.This demands further improvement in the classifier which can take care of both the OA and AA for separabilityentanglement classification.
As the experimented data set is highly unbalanced (refer Section IV A), the RUSBoost approach is explored for separability-entanglement classification and is validated over the state-of-the-art approaches.The subsequent section describes the RUSBoost ensembled CHA classifier.

III. RUSBOOST CHA (RUSBCHA)
Initially, all examples in the training data set are assigned equal weights.During each iteration of AdaBoost, a weak hypothesis is formed by the base learner.The error associated with the hypothesis is calculated, and the weight of each example is adjusted such that wrongly classified examples have their weights increased while correctly classified samples have their weights decreased.Therefore, subsequent iterations of boosting will generate hypotheses that are more likely to correctly classify the previously mislabeled examples.After all, iterations are completed, a weighted vote of all hypotheses is used to assign a class to the unlabeled samples.

Weighted average error calculation
Random sampling with replacement over weighted data  SMOTE adds new artificial minority examples by extrapolating between preexisting minority instances rather than simply duplicating original examples.The newly created instances cause the minority regions of the feature space to be fuller and more general.
The RUSBoost takes advantage of all these approaches by combining them.A detailed discussion on the RUSBoost approach can be found in [36].
Although significant classifier performance improvement is observed [1] in the case of BCHA as compared to standalone CHA, some limitations exist which are discussed in Section I. So, it can be further improvised in two ways 1) by replacing the classifier and 2) by increasing the feature space by proper feature extraction technique.Presently the first case is explored by incorporating the RUSBCHA classifier for possi-ble improvement in the classification results leaving scope to explore the feature extraction techniques as future work.

IV. EXPERIMENTAL SETUP
All the classifications were carried out on two kinds of feature spaces 1) vector represented ρ (d 2 −1 dimensional feature space), 2) vector represented ρ with CHA calculated α m max for a specific m (d 2 dimensional feature space).The experiments are carried out for both the two-qubit and two-qutrit systems.Five different techniques such as; Bagging, Boosting were tested on raw d 2 -1 (for two-qubit system d=4 and for twoqutrit system d=9) dimensional feature vector x, CHA with only one α m max , while, the BCHA and RUSBCHA are trained with both the x, and α m max .Their associated feature spaces are presented in Table I.The dataset details and the performance evaluators are presented below.

A. Dataset preparation
The total data space Ω is a combination of the separable subspace Ω 1 and entangled subspace Ω 0 ; such that Ω = Ω 1 ∪ Ω 0 and Ω 1 ∩ Ω 0 = ∅ (see Fig 3).Two datasets, representing the feature vectors of random density matrices for two-qubit and two-qutrit systems respectively, are supplied with their class labels in [37].The procedure for creating the random separable and entangled states can be referred to in the BCHA manuscript [1].The total and class-specific training and testing sample information for the pair of the experimented datasets; namely two-qubit and two-qutrit system, are presented in Table II and Table III respectively.Approximate 50% samples are randomly selected for training and the remaining 50% samples are used for testing to evaluate the performances of ML algorithms.
From Table II and Table III, we can observe that the class samples are unequally distributed within the dataset.A prevalence difference for a binary classification represents the degree of imbalance in the dataset.The dataset-specific prevalence difference of class samples can be interpreted as, for:  For a balanced dataset, the prevalence difference must approach 0. However, we can observe that the prevalence difference for the two-qubit dataset is high (0.86) and for the two-qutrit dataset, it is comparatively low (0.32).This clearly signifies that the experimented dataset is highly imbalanced.For such imbalanced datasets, the learning performance of any ML approach is greatly affected [30] and needs a careful performance evaluation.Such performance measures are discussed further.

B. Performance measures
For ease of understanding the binary classification, the confusion matrix is presented in Fig. 4. In the figure, columns represent the original class labels (supplied with the data) as true and false, similarly each row represents the outcome of the classifier.
True positive (TP) and true negative (TN) are defined as both the original (ground truth) and the obtained (classified)   For binary classification, let, out of N tested samples, there are N 1 and N 2 samples labeled as true and false respectively, (where N = N 1 + N 2 ).The average accuracy (AA) is the mean accuracy obtained for each class and is defined as and the average error (AE) as AE = 1 − AA.
Similarly, other important measures such as sensitivity (s = T P T P +F N ), specificity (r = T N N ), Precision (k = T P T P +F P ), Fmeasure and G-mean can be incorporated for validating the classification results.We will use the following two for our analysis: Higher values of OA, AA, F-measure, and G-mean are desirable for evaluating the performance of a classifier.

V. RESULTS AND DISCUSSION
We used both the datasets (see Section IV A) and all the performance measures described in Section IV B, to compare the proposed RUSBCHA and other state-of-art classifiers in terms of figures.For the robust representation of performances on the experimented data, all the classification performance measures are averaged over 30 independent evaluations.
The Bagging and Boosting classifier only incorporates the d 2 -1 dimensional feature vector x.The classification performance as; AE, F-measure, G-mean, and OE; for two-qubit and two-qutrit systems are presented in Fig. 5 (a) and Fig. 5 (b) respectively.For the two-qubit system, (Fig. 5 (a)) it is observed that the proposed Boosting approach outperforms the Bagging approach in terms of F-measure, G-mean, and AE.While marginal deviation is observed for OE.Similarly, for the two-qutrit system (Fig. 5  approaches, if α m max ≥ 1, x is separable; else, x is highly possible to be an entangled state.Hence, our proposed RUS-BCHA classifier also incorporates both the feature vectors x and α m max .To find the trade-off between the state-of-the-art BCHA and the proposed RUSBCHA approach, further experiments are made on both two-qubit and two-qutrit datasets.These experiments include: • Experiment 1: Performance evaluation of classifiers over varying m.
• Experiment 2: Performance evaluation of classifiers over varying percentages of training and testing samples.
• Experiment 3: Performance evaluation of classifiers on varying prevalence difference of dataset.

A. Experiment 1
In this experiment, the CHA, BCHA, and proposed RUS-BCHA classifiers are compared over varying m for both twoqubit and two-qutrit datasets.Experimental results are shown in Fig. 6 and Fig. 7.
For a two-qubit system, from the Fig. 6(b), it can be observed that the AE of BCHA is higher for all values of m as compared to CHA and RUSBCHA approaches.The BCHA performance has almost 40% error for the lower value of m.It can also be observed that, for lower values of m, the performances of CHA and RUSBCHA are similar, while, for higher values of m RUSBCHA has lower AE values.This clearly signifies that the proposed RUSBCHA is less biased to the majority classes and hence the average accuracy is higher in comparison to other state-of-approaches.A similar interpretation also can be seen in Fig. 6(d).
From the Fig. 6(a), it can be observed that the OE of BCHA has lower values, and hence its performance is better for lower values of m in comparison to RUSBCHA and CHA approaches.While the proposed RUSBCHA has intermediate performance in comparison to other state-of-approaches.However, in Fig. 6(c), the F-measure performances are equivalently similar for all approaches.On the other hand, for the two-qutrit system (Fig. 7), both the BCHA and RUSBCHA have similar performances over varying m with significant performance improvements as compared to the state-of-art CHA approach.In this experiment, you can observe better performance of proposed RUSBCHA approach for two-qubit in comparison to BCHA and CHA approaches.While similar performances are observed for both RUSBCHA and BCHA for two-qutrit datasets.To find the rationale for performance differences of these two datasets, further experiments are carried out.

B. Experiment 2
In literature, it is proved that several machine learning techniques such as neural network and deep learning require a large number of samples to train.The above problem may occur due to the sensitivity of the classifier to the percentage of training samples.In experiment 1, 50% of samples are trained and the rest are tested.Hence, further validation of the approaches is carried out with varying training (10%-50%) and testing (90%-50%) scales, and the performances are presented in Fig. 8 and Fig. 9 for two-qubit and two-qutrit systems respectively.Note that, for this experiment, the total samples are the same as Table II and Table III for the respective datasets.In this experiment, m is set as 2000 and 20000 for two-qubit data and two-qutrit data respectively.From the Fig. 8(a), it can be observed that OA of BCHA is 2.5% more than RUSBCHA, while in the Fig. 8(b) AA of RUSBCHA is more than 15% better than BCHA.However, the results of these classifiers do not vary by the variation in training percentages.Therefore, performance of both the classifiers is not sensitive to the number of training samples.For the two-qutrit data, in Fig. 9(a) and Fig. 9(b), you can also observe similar results.However, the AA performances in Fig. 8(b) and Fig. 9(b) suggests that the RUSBCHA performs better than BCHA, specifically for two-qubit dataset.Note in this respect that the prevalence difference of the twoqutrit dataset (0.3249) which is comparatively low referring to the prevalence difference of the two-qubit dataset (0.8593) for this experiment.This further suggests that doing further experiments to test both the classifiers with varying prevalence difference ratios might provide us some clue on how these classifiers work for imbalanced datasets.

C. Experiment 3
The above experiments were performed with two-qubit and two-qutrit datasets as mentioned in Table II and Table III respectively.From these tables, you can observe that the separable samples are only 7% and 33% of the total samples for two-qubit and two-qutrit datasets, respectively.To test the performance of classifiers for different prevalence differences, we created imbalanced datasets of different prevalence differences for both two-qubit and two-qutrit.Table IV shows the description of created imbalanced datasets for two-qubits.In this table, each row describes a dataset which is a subset of the dataset described in Table II.For each created dataset subset, its number of separable, entangled, and total samples are represented.Also for each entry in the table, the prevalence difference of the respective dataset is mentioned.One notices the prevalence difference values range approximately from 0 to 0.9.The value 0 represents the dataset is balanced, and value 0.9 represents the dataset is highly imbalanced.A similar interpretation for the two-qutrit dataset can be done from the Table V. Fig. 10 shows the classifier performances over the varying prevalence of two-qubit data.In the figure, the performances are averaged over 30 iterations, and in each iteration, a new subset of the dataset is created with varying prevalence differences (Table IV).For this experiment, we fixed these parameters m=2000, and 50% training samples.
It is observed from the Fig. 10(a) that the OA of both BCHA and RUSBCHA are similar up to 0.6 prevalence difference.However, afterward, there is a minor improvement of OA for BCHA approach in comparison to RUSBCHA ap- Fig. 11 shows the classifier performances over the varying prevalence of two-qutrit data.In the figure, the performances are averaged over 30 iterations, and in each iteration, a new subset of the dataset is created with varying prevalence differences (Table V).For this experiment, we fixed these parameters m=20000, and 50% training samples.
It is observed from the Fig. 11(a) that the OA of both BCHA and RUSBCHA are similar up to 0.3 prevalence difference.However, afterward, there is a minor improvement of OA for BCHA approach in comparison to RUSBCHA approach.From the Fig. 11(b) it can be observed that both BCHA and RUSBCHA performances are similar up to 0.25 prevalence difference.However, afterward, there is a sharp decline of AA for BCHA in comparison to RUSBCHA.From the results in Fig. 10 and Fig. 11, it can be observed that the performance of the proposed RUSBCHA approach is consistent (almost a straight line) over varying prevalence differences of data.So, it can be concluded that the performance of RUSBCHA is not heavily affected by the data imbalances.
Referring to our earlier observations, for Fig. 6: the reason for having good AA of proposed RUSBCHA over BCHA; and for Fig. 7: the reason for having similar performances of both RUSBCHA and BCHA can now be justified using Fig. 10 and Fig. 11 respectively.Since the prevalence difference of two-qubit data is 0.8593 our proposed RUSBCHA performs better than BCHA.While the prevalence difference of twoqutrit data is 0.3249, hence, both BCHA and RUSBCHA performances are similar.Hence, we can conclude that the RUSBCHA can be an alternative to the BCHA approach and also can be a better classifier to deal with highly imbalanced datasets.Overall, the ensemble learning is helpful for better understanding of separability-entanglement problem, when compared to the stand-alone CHA approach.

VI. CONCLUSION
The necessity of a separability-entanglement classifier is well-known in the quantum information forum.Although various necessary and sufficient criteria like PPT have been proposed in the past, still, they cannot be generalized for higher dimensions.The ML approaches are vastly exploited in the general data-mining perspective, while the discussions and applications are limited in quantum information processing.Similar to BCHA, we proposed RUSBCHA as an alternative ML-based solution for the quantum separability problem.The proposed RUSBCHA approach for quantum separability problem shown improvements in AE for the two-qubit system; while having similar responses for the two-qutrit systems in comparison to CHA.As the data is highly unbalanced, standard performance measures like OE, AE, F-measure, and G-mean are evaluated.The results suggest incorporating a proper ML approach to classify the separability-entanglement criteria with proper performance matrices.Also, the proposed RUSBCHA can be an alternative to CHA which can deal with the unbalanced dataset that may reduce the over-fitting error of the classifier.
In order to evaluate the effectiveness of the classifier, the feature extraction is unexploited here, however, this can be a further direction of research to improve the classification performance.Also, other ML approaches can be exploited and validated further.defined on the simplex d i ℓ i = 1, where θ > 0 is a parameter and C θ is a normalization constant.We set θ = 1 2 for sampling both the two-qubit and two qutrit states.
Note that our dataset is exactly the same as is used in Ref. [1].The Ref. [1] observed the following trends during training using the generated samples: • For the two-qubit case, approximately 7% of the states among 5 × 10 4 are PPT, i.e., separable state.
• Among fairly large samples (randomly generated) of two-qutrits, only 2.2% are PPT.After rejecting all the states with negative partial transpose while sampling as they are assumed entangled (prior information), the total collected PPT states are a total of 2 × 10 4 samples.Among PPT states, at least 66.24% are found to be separable using CHA.However, note that during the testing, NPT states are also included.
The authors in Ref. [1] observe that these trends are consistent with the previously predicted ones in Ref. [38].

FIG. 1 :
FIG. 1: Overview of Bagging classifier: Multiple learners are created by generating additional data points.The new data points are created randomly with a uniform probability as before.Generally, the created N learners are parallel and are further averaged to obtain the final learning error defined as e = 1 N Data sampling techniques attempt to alleviate the problem of class imbalance by adjusting the class distribution of the training data set.This can be accomplished by either removing examples from the majority class (under-sampling) or adding examples to the minority class (oversampling).

FIG. 3 :
FIG.3: Data space Ω as a combination of entangled Ω 0 and separable Ω 1 subspaces.c i represents the pure product states.

FIG. 5 :
FIG. 5: Classification results of the raw data without considering the CHA (α) for (a) two-qubit and (b) two-qutrit system.

FIG. 9 :
FIG.9: Obtained Overall Accuracy and Average Accuracy for the two-qutrit systems over varying percentage (%) of training samples (m=20000).
Overview of Boosting classifier: Similar to the Bagging approach, the Boosting classifier also generates multiple data points.But, unlike parallel in Bagging, the Boosting approach sequentially learns the error from the previous learner and assigns a higher weight to the miss classified data, and random sampling with weighted replacement is carried out.Also, another set of weights assigned to the learners are further accumulated to find the final weighted average error defined as e = N i=1 wiei.

TABLE I :
Various experimented classifiers with their associated feature space (dimensions).

TABLE II :
Dataset description of experimented training, testing, and total samples for two-qubit systems.

TABLE III :
Dataset description of experimented training, testing, and total samples for two-qutrit systems.

TABLE IV :
Description of imbalanced datasets created from the original two-qubit dataset of TableII.

TABLE V :
Description of imbalanced datasets created from the original two-qutrit dataset of TableIII.