Machine Learning Techniques for the Performance Enhancement of Multiple Classifiers in the Detection of Cardiovascular Disease from PPG Signals

Photoplethysmography (PPG) signals are widely used in clinical practice as a diagnostic tool since PPG is noninvasive and inexpensive. In this article, machine learning techniques were used to improve the performance of classifiers for the detection of cardiovascular disease (CVD) from PPG signals. PPG signals occupy a large amount of memory and, hence, the signals were dimensionally reduced in the initial stage. A total of 41 subjects from the Capno database were analyzed in this study, including 20 CVD cases and 21 normal subjects. PPG signals are sampled at 200 samples per second. Therefore, 144,000 samples per patient are available. Now, a one-second-long PPG signal is considered a segment. There are 720 PPG segments per patient. For a total of 41 subjects, 29,520 segments of PPG signals are analyzed in this study. Five dimensionality reduction techniques, such as heuristic- (ABC-PSO, cuckoo clusters, and dragonfly clusters) and transformation-based techniques (Hilbert transform and nonlinear regression) were used in this research. Twelve different classifiers, such as PCA, EM, logistic regression, GMM, BLDC, firefly clusters, harmonic search, detrend fluctuation analysis, PAC Bayesian learning, KNN-PAC Bayesian, softmax discriminant classifier, and detrend with SDC were utilized to detect CVD from dimensionally reduced PPG signals. The performance of the classifiers was assessed based on their metrics, such as accuracy, performance index, error rate, and a good detection rate. The Hilbert transform techniques with the harmonic search classifier outperformed all other classifiers, with an accuracy of 98.31% and a good detection rate of 96.55%.


Introduction
PPG has demonstrated that it is an effective device for the early diagnosis of cardiac disorders. Using PPG, blood volume fluctuations in tissues are continuously measured. PPG serves as a substantial promising technique for the thorough screening of cardiovascular diseases (CVDs). Blood circulation from the heart to the toes and fingertips is measured using PPG signals [1]. PPG sensors typically have an infrared-operating avalanche photo diode as a detector and a light-emitting diode (LED) as the source [2]. Both the European Committee for Standardization and the International Standards Organization (ISO) have recognized PPG as a standard noninvasive procedure for measuring and analyzing blood oxygen saturation levels. CVD leads to changes in the heart rate. PPG is skin-friendly, since it does not require direct skin-to-surface contact. Cardiac functions, such as blood flow, heart rate, and mean circulation time, are measured using PPG signals [3], since heart rate is associated with multiple physiological variables connected with hormonal and neuronal disturbances, pumping mechanisms, and a mean blood circulation time. A variation in duration of 8 min. This database consists of annotated respiratory signals, such as pressure, respiratory flow, and inhaled and exhaled carbon dioxide (capnogram). All 41 records were considered in this investigation. Twenty-one of the forty-one cases are normal, while the other twenty have cardiovascular disease. The dataset for the PPG signals was sampled at a rate of 300 Hz, and 144,000 samples of data were extracted from the PPG, with 720 segments overall. Each of these segments had 200 samples at equal intervals.
Independent component analysis (ICA) was used to remove the noise components in the PPG signals. The classification of a PPG signal involved two steps: First, dimensionality reduction (DR) was achieved with the help of heuristic-and transformation-based techniques. Second, these dimensionally reduced values were classified using various classifiers to detect whether the corresponding PPG signal was associated with a person with CVD or a normal subject. After the implementation of the dimensionality reduction techniques, the original PPG samples of a patient (200 × 720) were reduced to 100 × 720. These dimensionally reduced samples (100 × 720) were input into the classifiers for further classification. The organization of the CVD detection from the PPG signals is depicted in Figure 1.

Dimensionality Reduction Methods
Dimensionality reduction (DR) is a preprocessing technique used to eliminate irrelevant data and redundant features in order to reduce the training time of PPG signals [26].

Dimensionality Reduction Methods
Dimensionality reduction (DR) is a preprocessing technique used to eliminate irrelevant data and redundant features in order to reduce the training time of PPG signals [26]. All machine learning techniques and models become increasingly challenging as the dimensions of the input dataset increase. Dimensions are a problem in PPG signals with large amounts of data. As the number of features increases, the number of samples also increases proportionately, and the chances of overflow also increase. When a machine learning model is trained on large datasets, it becomes superabundant and results in a mediocre performance. As a result, it is necessary to decrease the number of features, which can be accomplished by dimensionality reduction. In this way, the DR technique is used to prevent data overfitting and select the most informative characteristics for classification purposes [27]. In this research, five dimensionality reduction techniques, divided into two categories, were utilized. First, transformation-based optimization techniques include Hilbert transformation and NLR optimization. Second, heuristic-based optimization includes ABC PSO, cuckoo search, and dragonfly optimization. These methods are discussed in the following.

Hilbert Transform
The Hilbert transform (HT) is a mathematical operation that is used to obtain the analytical representation of a real-valued signal.
The Hilbert transform [28] of signal y(t) is given bŷ It can be observed from the above equation that this transformation has no effect on the independent variable; therefore, the outputŷ(t) is also a function that changes over time. Furthermore, the outputŷ(t) is a linear function of input y(t). It is produced by applying convolution to y(t) with (πt) −1 , as shown in the equation below: By applying Fourier transform (FT) to the above equation, we obtain A phase shift of −90 degrees is produced on all positive frequency components of the real-valued signal y(t) and +90 degrees on all negative frequency components when a Hilbert transform is applied to y(t). The domain of the signal is not changed by HT. When a Hilbert transform is applied to two different signals with the same amplitude but different phases, the magnitude spectrum is the same because the transform does not change the magnitude spectrum but changes the phase spectrum. This small phase response is obtained from spectral analysis using the Hilbert transform. In signal processing, the Hilbert transform is frequently used to generate the analytical representation of the real-valued signal y(t). All Fourier transformable signals are Hilbert transformable [29].

Nonlinear Regression
Nonlinear regression (NLR) is a statistical technique that involves developing a regression model to represent a nonlinear relationship between independent and dependent variables. The fundamental concept of linear regression and nonlinear regression is the same, that is, to connect a response, R, to a set of predictors, Z = (z 1 , z 2 . . . . . . .z n ) T . The prediction equation for nonlinear regression varies nonlinearly on one or more unidentified parameters. Typically, nonlinear regression occurs when there is a specific functional shape in the relationship between the predictors and the response. The main goal of this model is to provide low sums of squares. The sum of squares parameter is connected to the number of observations that deviate from the dataset's mean value. The variance between the mean and the individual points in the dataset is used to estimate the mean or average parameter. The collected variants are squared before being added together in the final step. The objective best matches the dataset points if the sum of the squared variations is found to be low. The least squares method, the equations of which comprise nonlinear elements, is used to compute the parameters in a nonlinear model. The steepest descent, Taylor series, and Levenberg-Marquardt methods can be used to solve these kinds of equations. The least squares in a nonlinear model is calculated by applying the Levenberg-Marquardt algorithm, which is its most extensive use. The advantages of this strategy include the optimum feature selection and deserving model convergence through iterations.
The structure of a nonlinear regression model as shown in [30] is where R i are responses; f is a known function of the covariate vector Z i = (z i1 , z i2 . . . . . . .z in ) T and the vector parameter ϕ = (ϕ 1 , ϕ 2 . . . . . . .ϕ n ) T ; and e i represents the random error values. Typically, it is assumed that random errors have an uncorrelated mean of zero and constant variance. The formula for calculating the residual sum of squares is expressed as

ABC-PSO
ABC-PSO stands for artificial bee colony-particle swarm optimization. It is a hybrid optimization algorithm that combines the artificial bee colony (ABC) algorithm and the particle swarm optimization (PSO) algorithm to enhance the search capability and convergence speed of the optimization process. The ABC method has the advantages of simplicity, flexibility, and fast achievement of good results for multidimensional datasets. It is obvious that, in order to locate the locations of new food sources, the ABC algorithm does not necessarily need to apply the population's best global solution. In the meantime, it may be concluded that PSO particles may not be able to escape from the local minima by a performing random search as scout bees in ABC. In addition, the update equation in ABC updates a single variable instead of all variables as in PSO. In order to achieve the best results, the ABC algorithm has been combined with the PSO search algorithm. In ABC-PSO, three ABC phases are used, and for the employed bee phase, it uses velocity and the PSO's method of locating new food sources. The best location currently visited by the person is updated after the position of a new food supply is updated. The food source's trial counter is reset if the current best position is altered; otherwise, its value is raised by 1. Onlooker bees memorize their positions throughout the employed bee phase and search for new food sources based on their knowledge of the best food source locations that the employed bees have visited. The new candidate solution "z" is generated by utilizing the ABC update equation, as follows: where z mn is the mth dimension of the nth employed bee selected; k is the random index; l is the index of a randomly selected individual; and ∅ mk is a random number between −1 and 1. The food source trial counter will reset if the new food source location has a better value; otherwise, it is raised by 1. The scout bee phase follows the ABC algorithm [31]. ABC-PSO Algorithm: 1.
Initialization of the swarm.

2.
Velocity and position of the particle are updated by performing the employed bee phase.

3.
The local best position of a particle is updated by finding its new position by performing the onlooker bee phase.

4.
If the highest value of the trial counter for any food source is higher than the limit, a scout bee will search for a new food source site.

5.
At this point, instead of using scout bees, the PSO algorithm is used to look for new food sources. 6.
Particles with random placements are used to initialize the population of new food sources. 7.
The fitness value is determined for all particles for the specific objective function. 8.
The fitness function is used to select the optimal set of features. The expression for the fitness function is as follows: where ϕ k is the classifier performance in subset K; b is the feature subset length; T is the total number of features; and a is the classification quality. 9.
The number of particles that are currently present is set as pbest. 10. A new set of particles is created by adding velocity to the initial particle, and a fitness value is calculated for the same. 11. A new pbest is discovered between the two particle sets by comparing the fitness values of each particle to one another. 12. The least fitness value is determined by comparing the two sets of particles, and the corresponding particle is then referred to as the Gbest. 13. Simultaneously, in the next iteration, the update in the velocity (v q+1 ) and position x q+1 is conducted as follows: The maximum step size that a particle can take in each iteration is influenced by the acceleration coefficients a and b. 14. The PSO iterations are continued until convergence is reached. 15. The finest food source is identified and remembered. 16. The process can be performed as many times as necessary to fully satisfy the stopping criteria.
The main goal of this combined hybridization method is to combine the elements of ABC and PSO so that separate problems may be readily addressed by ABC and PSO simultaneously addressing rotationally invariant behavior.

Cuckoo Search
The cuckoo search (CS) algorithm is a nature-inspired metaheuristic optimization algorithm, which is based on the brood parasitism of some cuckoo species, along with Levy flight random walks established by Xin-She Yang and Suash Deb [32]. This optimization method depends on the incubate parasitism activities of certain cuckoo types along with the behavior of the Levy flights of certain fruit flies and birds. The three common rules of the CS algorithm can be given as follows:

1.
At a certain time, every cuckoo bird lays one egg and dumps it in an arbitrary selected host nest.

2.
The subsequent generation will carry the top-quality eggs, which are present in the best host nests.

3.
There are only fixed quantities of host nests available, and a host bird can realize a cuckoo's egg with a probability of P a ∈ [0 1]. In this instance, a cuckoo's egg in the where ⊕ denotes the entry-wise multiplication, and β > 0 is a scaling factor that denotes the step size. Here, the β value is considered the one to optimize. A random walk is provided by the Levy flight, and random step sizes are calculated from the Levy distributions as follows: This has an infinite variance and mean. The symbol "∼" indicates that the numbers being generated are pseudorandom and are drawn from a probability distribution. The consecutive jumps of a CS are basically a random walk procedure that follows a steplength distribution of a power-law through a heavy tail. Utilizing the Levy walk approach, several new solutions around the current best solution should be produced by wide-area randomization. The locations of the new solutions should be far from the current best solution. This will prevent the system from becoming stuck at a local optimum.

Dragonfly
The dragonfly algorithm (DA) is a swarm intelligence algorithm, and it is inspired by the dynamic and static swarming behaviors of dragonflies. The dragonfly algorithm is a modern heuristic optimization technique created by Mirjalili in 2016 [33]. The static and dynamic swarming behaviors of dragonflies in nature serve as the primary source of inspiration for the dragonfly algorithm. In the dynamic or exploitation phase, a huge number of dragonflies form swarms and travel over long distances in one particular direction to distract their enemies. In the static or exploration phase, dragonflies form groups and move frontward and backward in a small zone for hunting and attract their prey. The five fundamental principles of DA are separation, alignment, cohesiveness, attraction, and diversion. In the following equations, Q and Q j denote the current and j th positions of the individual dragon flies, respectively, and the total number of neighboring flies is denoted by K.

1.
Separation: This indicates the static avoidance of flies colliding with other flies in the area. It is calculated as where S i denotes the separation motion of the ith individual.

2.
Alignment: This signifies the velocity matching among individual flies within the same group. It is denoted as where V j denotes the velocity of the jth individual.

3.
Cohesiveness: This denotes the tendency of individual flies to move to the center of swarms. The estimation of this is given by 4. Attraction towards the nourishment source is estimated as where F i denotes the nourishment source of the ith individual and Q + is the position of the nourishment source.

5.
Diversion: This represents the distance outwards to the enemy. It is calculated as where E i denotes the ith individual enemy's position and Q − denotes the enemy's position.
Within the search space, the locations of artificial dragonflies are updated by the step vector, ∆Q, and the current position vector, Q. The direction of the dragonfly's movement is indicated by the step vector, ∆Q, and it is evaluated as where s, a, c, f , and e are the separation weight, alignment weight, cohesion weight, attraction weight, and enemy weight, respectively. The inertia weight is denoted by "ω" and "t" denotes the iteration number. The exploration and exploitation phases can be obtained by changing the weights.
At t + 1 iterations, the position of the ith dragonfly is calculated as follows:

Statistical Analysis of Dimensionally Reduced PPG Signals
The dimensionally reduced PPG signals were analyzed through the extraction of statistical parameters and sample entropy for ascertaining the nonvariation in PPG signal characteristics. The statistical features [34], such as the mean, variance, skewness, kurtosis, Pearson correlation coefficient (PCC), and sample entropy [35], were extracted from the dimensionally reduced PPG samples among the CVD and normal classes. This reduced dataset provides the appropriate information from the above features. Table 1 shows that the statistical analysis of the parameters was carried out with DR techniques for the PPG signals. It is observed from Table 1 that, for normal cases, lower mean values were obtained across the various optimization techniques. For cases of CVD, higher mean values, except for ABC-PSO and the negative mean, were obtained under dragonfly optimization. The skewness and kurtosis were highly skewed values for normal as well as CVD cases. It is inferred from Table 1 that the sample entropy values were the same across all classes, except for the NLR DR method in normal cases and the cuckoo search for the CVD cases. In addition, from Table 1, it is shown that the PCC values were low, which indicates that the optimized features were nonlinear and uncorrelated within the classes. Therefore, it is better to apply nonlinear classifiers to detect the CVD and normal segments of the PPG signals. If the CCA values are greater than 0.5, then there will be high correlation across the classes. From Table 1, it can be noticed that the Hilbert transform optimization was highly correlated across the classes. It also exhibits that the ABC-PSO optimization method was the least correlated among the classes. Therefore, the above analyses of the dimensionally reduced PPG signals make a strong case for the usage of better classifiers. In order to identify the presence of nonlinearity in the dimensionally reduced signals, a normal probability plot for Hilbert transform-based dimensionally reduced values of the PPG signals in cases of CVD for Patient 2 is shown in Figure 2. From Figure 2, it can be observed that the normal plots exhibit the presence of nonlinearity and the overlapping of the Hilbert transformed values of the PPG signals. Therefore, the above analyses of the dimensionally reduced PPG signals make a strong case for the usage of better classifiers. In order to identify the presence of nonlinearity in the dimensionally reduced signals, a normal probability plot for Hilbert transform-based dimensionally reduced values of the PPG signals in cases of CVD for Patient 2 is shown in Figure 2. From Figure 2, it can be observed that the normal plots exhibit the presence of nonlinearity and the overlapping of the Hilbert transformed values of the PPG signals.

Classifiers for Detection of CVD
Classifiers play a vital role in classifying data. An ideal classifier is one that provides high accuracy with a low error rate for a given computational complexity. The following sections of this paper discuss the classifiers that were used for this purpose. Mathematically, the eigenvalues and eigenvectors of the data covariance matrix are calculated to obtain principal components. The direction of the largest variation is identified from the eigenvector that has the largest eigenvalue [36].

Expectation Maximization as a Classifier
The expectation maximization (EM) algorithm is a method used to compute the maximum likelihood estimation in the presence of latent variables. Consider as the observed data, statistical parameters as , and the missing data as . The aim is to maximize the function.
This equation cannot be systematically solved. It is assumed that the whole likelihood parameter or the posterior distribution , ( | ) can be dealt easily with by applying the EM algorithm [37].
To reach convergence, this algorithm iterates between the E and M phases, as follows: • E step (expectation): calculate the Q function: • M step (maximization): compute the maximum:

Classifiers for Detection of CVD
Classifiers play a vital role in classifying data. An ideal classifier is one that provides high accuracy with a low error rate for a given computational complexity. The following sections of this paper discuss the classifiers that were used for this purpose. Mathematically, the eigenvalues and eigenvectors of the data covariance matrix are calculated to obtain principal components. The direction of the largest variation is identified from the eigenvector that has the largest eigenvalue [36].

Expectation Maximization as a Classifier
The expectation maximization (EM) algorithm is a method used to compute the maximum likelihood estimation in the presence of latent variables. Consider Z as the observed data, statistical parameters as ϕ, and the missing data as δ. The aim is to maximize the function.
p(Z|ϕ) = p(Z, (δ|ϕ))dδ (19) This equation cannot be systematically solved. It is assumed that the whole likelihood parameter or the posterior distribution p(Z, (δ|ϕ)) can be dealt easily with by applying the EM algorithm [37].
To reach convergence, this algorithm iterates between the E and M phases, as follows: • E step (expectation): calculate the Q function: • M step (maximization): compute the maximum: where "t" denotes the iteration number. In the E-step, for each test point, the likelihood is computed from the individual cluster, and the calculation of the respective probability is carried out by assigning the test point to the corresponding cluster based on the maximum probability. All parameters are updated in the M step. This algorithm is repeated until it reaches convergence.

Logistic Regression as a Classifier
Logistic regression (LR) is a type of supervised machine learning algorithm that is utilized for predicting the probability of a target variable. The inputs are applied through a prediction function, which yields a probability value between 0 and 1, where 1 indicates CVD and 0 indicates normal [38]. To classify the positive and negative classes, a hypothesis, Hθ(x) = θ P X, is designed. The threshold value Hθ(x) for the classifier is 0.5. If Hθ(x) ≥ 0.5, then classify the values into the CVD class, and if Hθ(x) < 0.5, then classify the values into the normal class. A classifier with at least a 50% predicted chance of classifying a CVD will be classified as class 1. Therefore, the threshold value Hθ(x)) for the logistic regression classifier was set to 0.5.
The LR function is given below:

Gaussian Mixture Model (GMM) as a Classifier
The Gaussian mixture model (GMM) is a machine learning technique utilized to classify data into different groups according to the probability distribution. Combinations of the number of Gaussian distributions are referred to as GMM. Given the data vector y, the GMM is defined as [39] p(y|θ) = ∑ Z q=1 π q M y µ q , ∑ q (23) where ∑ q , µ q ,π q are the covariance, mean, and mixture components of the GMM, respectively. In addition, The R-dimensional Gaussian distribution is represented by M: The EM algorithm is used to calculate the GMM's parameters by applying the E and M steps.
• E step: the posterior probability, p t iq , is evaluated at t iterations as • M step: utilizing the probabilities evaluated from the E step, the parameters ∑ q , µ q ,π q are updated at t + 1 iterations: To stabilize the parameters at a specific value, these two steps are repeated.

Bayesian Linear Discriminant Analysis as a Classifier
The Bayesian linear discriminant analysis classifier (BLDC) is a generative model that estimates the probability distribution of data for each class and uses the Bayes' theorem to predict the class of new data. In order to maximize the class posterior probability, it is chosen from observation "k", or in the case of two classes, x and y. Choose class x, if q x (k) − q y (k) ≥ D, where D denotes the decision threshold, and q x (k) is the discriminant function , q x (k) = ln P(x|k) [40].
Assume that every class observation is taken from the multivariate normal distribution and that the covariance matrix is identical for all classes. Now, by applying the Bayes rule, the discriminant function is given as follows: where µ x is the mean feature vector for class x, Σ denotes the pooled covariance matrix for all classes, and P(x) denotes class x's prior probability. The prior probability of all classes is considered constant; then, the decision boundary, D, is given as follows: It is very clear that the mean vectors µ x and µ y are further away from each other in the feature space. If the term ∑ −1 µ x − µ y is larger, then the classes are more separable.

Firefly Algorithm as a Classifier
The firefly algorithm is a metaheuristic approach used for solving optimization problems, and it is inspired by the flashing patterns exhibited by fireflies, which was first developed by Yang [41]. The firefly algorithm employs three idealized rules: 1.
All fireflies considered here are unisex in nature, and along these lines, one firefly will be attractive to other fireflies irrespective of sex.

2.
The attractive feature of a particular firefly varies with respect to its intelligence. Thus, for any two fireflies, the brighter firefly effectively pulls in the darker firefly. Assuming there are no fireflies brighter than a particular firefly, at that point, that specific firefly will move arbitrarily.

3.
When the distance from the firefly increases, the brightness or light intensity of a firefly will decrease because the light is captured as it passes through the air. Subsequently, the engaging quality or intelligence of a particular firefly, "s", seen by firefly "t" is characterized as where β is the light ingestion coefficient of a particular medium, α s (0) signifies the brightness of firefly "s" at r = 0, and r indicates the Euclidean distance between firefly "t" and firefly "s": where z t and z s are the individual areas of fireflies "t" and "s", respectively. If firefly "s" is the brighter one, then its amount of attractiveness directs the movement of the specific firefly "t" as per the accompanying condition: where γ is the randomization parameter, and rand denotes a random number taken from a uniform distribution that lies in the range between −1 and +1, inclusively. Firefly "t" can effectively move towards firefly "s" based on the second term in the above equation.

Harmonic Search as a Classifier
Harmony search (HS) is a music-based metaheuristic algorithm inspired by the evolution of music and the pursuit of the ideal harmony. Geem et al. [42] proposed that HS imitates the improvisational method used by musicians. The steps to be followed in the HS algorithm are:

1.
Problem Definition and HS Parameter Initialization An unconstrained optimization problem is described as the minimization or maximization of the objective function, f (Y), given as follows: where Y denotes the decision variable set; y i is the set of all possible values of every decision variable; and Uy i and Ly i represent the upper and lower bounds of the ith decision variable.

2.
Initialization of the Harmony Memory In this stage, the harmony memory (HM) is initialized. All decision variables in the HM are kept as matrices. The opening harmony memory is created from a uniform random distribution of values that are constrained by the parameters Uy i and Ly i .

Improvisation of a New Harmony
The HM is utilized in this process to create a new harmony.

4.
Updating the Harmony Memory The HM is updated with the new harmony vector and the minimal harmony vector is deleted from the HM if the new improvised harmony vector is superior to the minimum harmony vector in the HM.

5.
Verification of the Terminating Criterion When the termination criterion is satisfied, the iterations are terminated. If not, steps 3 and 4 are repeated until the allotted number of iterations has been reached.

Detrend Fluctuation Analysis as a Classifier
Detrended fluctuation analysis (DFA) is a mathematical method used to analyze the presence of long-term correlations, or persistence, in a time series dataset. The main purpose of DFA is to scale long-range correlations in a time series. DFA is very similar to the Hurst exponent analysis, which is an enhancement of the standard fluctuation analysis [43]. DFA heavily relies on the random walk theory.
The cumulative profile, M t , is obtained from the bounded time series, m q , of length K, which is given by A local least squares straight-line fit is calculated by minimizing the squared errors inside each time window after dividing K into time windows, each with the length of n samples. The average fluctuation function is given by where N t is the piecewise sequence of the straight-line fits.

Probably Approximately Correct (PAC) Bayesian Learning Method as a Classifier
PAC Bayes is a generic framework to efficiently rethink generalization for numerous machine learning algorithms. It influences the flexibility of Bayesian learning and allows deriving new learning algorithms [44]. In PAC Bayesian learning theory, the hypothesis space is denoted as "S", which is the variation between the empirical error and expected error of a hypothesis. The high-probability bounds of the deviation of the weighted averages of independent random variables are provided by the PAC Bayesian analysis of "s", which is derived by the function θ(s). Prior distribution over the hypothesis space is denoted as Π, and the randomized classifier is defined as γ. Each time the game is played, the randomized classifier selects a hypothesis "s" from "S" in accordance with γ and uses it to predict the outcome of the subsequent sample.

KNN-PAC Bayesian Learning Method as a Classifier
KNN (K-nearest neighbors)-PAC (probably approximately correct)-Bayesian learning method is a machine learning algorithm used to make more accurate predictions by finding the nearest neighbors and updating the probability distribution of the data. The PAC Bayesian classifier is used to evaluate the divergence and training error of the finite data sample. For greater divergence, the risk factor will be high. The PAC Bayesian classifier output is fed to the KNN classifier to improve the accuracy of the classification. The KNN algorithm simply stores the dataset during the training phase and then classifies incoming data into a category that is very close to the previously stored dataset. A nonspecific sample can be classified considering the training data and samples. The selection of the K value is a critical stage in the KNN algorithm [45].

Softmax Discriminant Classifier (SDC) as a Classifier
The softmax discriminant classifier (SDC) is a supervised machine learning algorithm used for multiclass classification problems. It is based on the concept of the discriminant function, which maps input variables to a class label. SDC properly identifies the testing sample to which a particular class belongs by weighing the distance between the training and testing samples from that particular class [46]. Considering the training set X = [X 1 , X 2 , . . . ., X k ] ∈ R m×n , chosen from "k" different classes, and X k = X k 1 , X k 2 , . . . ., X k n k ∈ R m×n k represents n k samples from the k th class, where ∑ k j=1 n j = n, and the testing samples are assumed to be X ∈ R m×1 .
Here, the SDC is defined as where q j x , f (x) denotes the separation between the testing and jth class sample. A relative penalty cost is given when λ > 0. Where x and x i have similar characteristics, if x belongs to the jth class, ||x − x j i || 2 is almost taken to zero and q j x can asymptotically reach the maximum value.

Detrend with SDC as a Classifier
Detrend fluctuation analysis (DFA) with the softmax discriminant classifier (SDC) is a machine learning algorithm that combines the DFA and SDC techniques to classify time series data with long-range correlations by removing the trend and identifying the long-term correlations before classification. The correlation qualities of PPG signals can be determined over a long duration in DFA. SDC is used to determine and identify the class to which a particular test sample belongs.

Results and Discussion
This section explores the performances of different classifiers based on their benchmark parameters. A better classification accuracy with a lower error rate leads to a good classi-fier performance. Therefore, the classifiers were trained and tested for the dimensionally reduced values in the CapnoBase PPG signal dataset.

Training and Testing of the Classifiers
The training and testing of classifiers are very important steps for all of the classification processes. The training allows a classifier to learn the patterns associated with the given DR data. In this study, we chose 90% of the data for training and 10% for testing. The mean square error (MSE) was maintained as the stopping criteria for the training and testing of the classifiers. The mathematical expression for MSE is given below: where O i is the observed value at a definite time; T k indicates the model k's target value, where "k" varies from 1 to 15; and M is assumed as 1000 and denotes the number of observations per case.

Selection of the Optimal Parameters for the Classifiers
Consider that the PPG dataset had two classes, namely CVD and normal, when determining the target values, and the target T CVD was carefully selected with higher values in the range from 0 to 1. The condition used for selecting T CVD is as follows: The features of the total (X) CVD PPG data were normalized, and their mean is signified by µ i , as mentioned in Equation (40), which can be applied for the classification.
For normal subjects, the target T Normal with lower values between 0 and 1 was preferred when implementing the condition: The features of the total (Y) normal PPG data were normalized, and their mean is signified by µ j , as mentioned in Equation (41), which can be applied for the classification.
The T CVD value should be greater than that estimated for µ i and µ j . It must be determined whether the difference between T CVD and T Normal is zero or greater than 0.5.
Depending on the condition given in (42), the T CVD and T Normal values were set as 0.85 and 0.1, respectively. The classifiers were trained with a 10-fold training and testing method, along with an MSE value of (10) −5 or a maximum operation of 1000, whichever was achieved earlier, as the stopping criterion. Table 2 demonstrates the selection of the optimal parameters for the classifiers. Table 3 illustrates the analysis of the testing MSE values for the CVD and normal cases across the various classifiers with different DR techniques. It is perceived from Table 3 that, for the CVD cases, the ABC-PSO DR method with the DFA classifier resulted in the overall minimum MSE value of 4.00 × 10 −8 . The cuckoo search DR technique with the PCA classifier resulted in an overall maximum MSE of 6.60 × 10 −4 . Similarly, for normal classes, the overall minimum MSE of 9.00 × 10 −8 was obtained when the Hilbert transform DR values were classified with the harmonic search classifier. The overall maximum MSE of 4.84 × 10 −4 was obtained when the cuckoo search DR values were classified with the logistic regression classifier. Table 2. The selection of the optimal parameters for the classifiers.

Principal component analysis (PCA)
Decorrelated Eigen vector w k , and a threshold value of 0.72 with the training of trial and error method with MSE of (10) −5 or a maximum iteration of 1000-whichever happens first. Detrend with SDC Cascaded condition of DFA with SDC classifiers with parameters as mentioned above.

Performance Metrics of the Classifiers
In order to analyze the performance of the classifiers, the parameters, namely the performance index (PI), sensitivity, specificity, accuracy, good detection rate (GDR), and error rate, were calculated from the confusion matrix. Table 4 depicts the general confusion matrix for the detection of CVD.
True positive (TP): An output where the model accurately predicted the positive class, indicating that the person has cardiovascular disease.
True negative (TN): An output where the model accurately predicted the negative class, which shows that it is a healthy person.
False positive (FP): An output in which the positive class was incorrectly predicted by the model, which indicates that the healthy person is incorrectly classified as having CVD.
False negative (FN): An output in which the negative class was incorrectly predicted by the model, which indicates that a person with CVD is incorrectly classified as a healthy person.
PPG signals are sampled at 200 samples per second. Therefore, 144,000 samples per patient are available. There are 41 patients with 20 labeled has having CVD and 21 patients labeled as normal cases. Now, a one-second-long PPG signal is considered as a segment. Hence, there will be 720 such segments available per patient. The total number of segments for CVD cases is [20 × 720 = 14,400] and for normal cases is [21 × 720 = 15,120]. Therefore, the overall available segments for a total of 41 cases are 29,520 segments of one second duration only. The PPG signals are analyzed across the patients based on the signal segments. The beat to beat analysis is not included in this study. As a sample, the confusion matrix attained for the Hilbert transform DR method for different classifiers was undertaken, as shown in Table 5. As indicated in Table 5, the harmonic search classifier attained higher classification capability than the other eleven classifiers with low MSE values. At the same time, the firefly classifier well indented with more false negatives and less true positives categories.  The performance index (PI) is calculated as The sensitivity, specificity, accuracy, good detection rate (GDR), and error rate were calculated as shown below [19]: As a sample for the classifier performance analysis, the cuckoo search DR method was undertaken, as shown in Table 6. As mentioned in Table 6, the harmonic search classifier performed better than all other classifiers in terms of the parametric values, such as an accuracy of 96.095%, performance index (PI) of 91.29%, sensitivity of 92.185%, specificity of 100%, highest GDR of 92.19%, and the lowest error rate of 7.81%. On the other hand, the firefly classifier showed the lowest performance with respect to all of the parametric values, such as an accuracy of 78.275%, a performance index (PI) of 20.76%, a sensitivity of 56.15%, a specificity of 100%, a GDR of 56.15%, and a high error rate of 43.855%. It is also observed from Table 6 that the GMM classifier reached a high sensitivity of 100% and ebbed at a low specificity of 84.38% due to the low number of true negative subjects. Even though there was 100% specificity for the PCA, logistic regression, BLDC, firefly, DFA, and PAC Bayesian learning classifiers, this did not assure good sensitivity values, except in the case of the harmonic classifier.  Table 7 exhibits the consolidated classifiers' performance analysis across the different DR techniques. We can deduct from Table 7 that the Hilbert transformation DR approach with the harmonic search classifier retained its number one position with the highest PI of 96.485%, the highest accuracy of 98.31%, the lowest error rate of 3.38%, and the highest GDR of 96.55%, whereas the logistic regression classifier produced the lowest PI of 17.07%, the lowest accuracy of 77.38%, and the highest error rate of 45.245% under the Hilbert transformation DR technique. The lowest GDR of 40.755% was produced by the firefly classifier under the ABC-PSO DR method.   Figure 4 displays the performance analysis of the classifiers for the different DR techniques with respect to the error rate and GDR parameters. The harmonic search classifier achieved the highest GDR of 96.55%, with the lowest average error rate of 3.38% for the Hilbert transform DR technique, and the logistic regression classifier achieved the highest error rate of 45.245% for the Hilbert transform DR method, while the firefly classifier achieved the lowest GDR of 40.755% for the ABC-PSO DR technique.  Figure 5 demonstrates the performance analysis of the classifiers for the different DR techniques with respect to the accuracy. According to Figure 5, the harmonic search classifier clearly produced the highest accuracy of 98.31%, and the logistic regression classifier had the lowest accuracy of 77.38% for the Hilbert transformation dimensionality reduction method.
Next, we extensively examined the classifier accuracy as follows: for the PCA classifier, the higher accuracy of 87.45% was attained for the ABC-PSO DR method, and a lower accuracy of 80.36% was limited to the cuckoo search DR method. Figure 5 shows that the EM classifier maintained a high accuracy of 89.715% when using the cuckoo search DR method, while the NLR DR technique maintained a low accuracy of 82.255%. The high accuracy value for the logistic regression classifier was limited to 84.015% for the ABC-  Figure 5 demonstrates the performance analysis of the classifiers for the different DR techniques with respect to the accuracy. According to Figure 5, the harmonic search classifier clearly produced the highest accuracy of 98.31%, and the logistic regression classifier had the lowest accuracy of 77.38% for the Hilbert transformation dimensionality reduction method.
with the dragonfly DR method. The KNN-PAC Bayesian learning hybrid classifier reached a high accuracy of 89.26% for the Hilbert transformation DR method and maintained a low accuracy of 84.765% with the NLR DR method. For the SD classifier, it achieved a high accuracy of 94.99% with the cuckoo search DR technique, and it displayed a drastically low accuracy of 83.615% for the dragonfly DR technique. In the case of the DFA SDC hybrid classifier with the NLR DR method, it remained at a high accuracy at 92.585% and reached a low accuracy of 82.215% for the dragonfly DR method. The robustness of the classifiers is reflected in the accuracy of the classifiers across the five dimensionality reduction techniques, namely Hilbert transform (HT), nonlinear regression (NLR), artificial bee colony-particle swarm optimization (ABC-PSO), cuckoo search, and dragonfly. All other classifiers, except the logistic regression classifier, settled at a higher accuracy of the maximum 80%, as shown in Table 7. This is due to the fact that the classifiers are trained and the optimal parameters for the classifiers are attained after Next, we extensively examined the classifier accuracy as follows: for the PCA classifier, the higher accuracy of 87.45% was attained for the ABC-PSO DR method, and a lower accuracy of 80.36% was limited to the cuckoo search DR method. Figure 5 shows that the EM classifier maintained a high accuracy of 89.715% when using the cuckoo search DR method, while the NLR DR technique maintained a low accuracy of 82.255%. The high accuracy value for the logistic regression classifier was limited to 84.015% for the ABC-PSO DR method and a low accuracy of 77.38% for the Hilbert transformation DR technique. The GMM classifier placed a high accuracy of 95.12% with the dragonfly DR method and achieved a low accuracy of 86.79% with the NLR DR technique. The BLDC classifier secured a high accuracy of 90.63% with the cuckoo search DR method and retained a low accuracy of 80.21% with the ABC-PSO DR technique. The firefly classifier settled at a high accuracy of 94.145% with the NLR DR method and a low accuracy of 78.275% for the cuckoo search DR technique. For the harmonic search classifier, it showed a remarkable performance with a high accuracy of 98.31% with the Hilbert transformation DR method and maintained a low accuracy of 91.805% for the ABC-PSO DR technique. The harmonic search classifier maintained a high accuracy across the different DR methods, which was due to the better segregation and learning ability of the classifier. In the case of the DFA classifier, a high accuracy of 95.575% was achieved with the ABC-PSO DR method and a low accuracy of 88.38% was achieved with the NLR DR technique. The DFA classifier exhibited the second-best classification accuracy performance across the DR techniques. In the case of the PAC Bayesian learning classifier with the Hilbert transformation DR method, it retained a good accuracy of 83.525% and a low accuracy of 78.7% with the dragonfly DR method. The KNN-PAC Bayesian learning hybrid classifier reached a high accuracy of 89.26% for the Hilbert transformation DR method and maintained a low accuracy of 84.765% with the NLR DR method. For the SD classifier, it achieved a high accuracy of 94.99% with the cuckoo search DR technique, and it displayed a drastically low accuracy of 83.615% for the dragonfly DR technique. In the case of the DFA SDC hybrid classifier with the NLR DR method, it remained at a high accuracy at 92.585% and reached a low accuracy of 82.215% for the dragonfly DR method.
The robustness of the classifiers is reflected in the accuracy of the classifiers across the five dimensionality reduction techniques, namely Hilbert transform (HT), nonlinear regression (NLR), artificial bee colony-particle swarm optimization (ABC-PSO), cuckoo search, and dragonfly. All other classifiers, except the logistic regression classifier, settled at a higher accuracy of the maximum 80%, as shown in Table 7. This is due to the fact that the classifiers are trained and the optimal parameters for the classifiers are attained after the tuning process. The k-fold training and testing of the classifiers caused the classifiers to be more robust for the detection of CVD in the submitted PPG signal.

Summary of Previous Works on the Detection of CVD Classes
A summary of previous works on the detection of CVD classes is listed in Table 8. The time and frequency domain features, SVD, and stochastic features were extracted from the PPG signals, and these features were classified with various classifiers, such as ANN, KNN, ELM, GMM, softmax regression model, DNN, SDC, SVM, and harmonic search, to detect cases of CVD. It is observed in Table 8 that Soltane et al. [47] proposed the artificial neural network (ANN) method to divide the PPG signal into two different classes. The input signal was smoothed to reduce the dimensionality, and the smoothing accuracy was used to explore the features in the multilayer feed-forward networks that were highly parallelized (MFN), and this achieved a classification rate for testing datasets of 94.7% and training datasets of 100%. Hosseini et al. [48] utilized finger PPG, a noninvasive optical signal collected before and after reactive hyperemia, to distinguish between people with various CVDs, with a maximum accuracy of 81.5% for the KNN classifier. Shobitha et al. [49] used the extreme learning machine (ELM), a supervised learning algorithm, to classify PPG signals as normal or affected by cardiovascular illness and compared its performance with backpropagation and support vector machine (SVM) techniques. These algorithms were validated by testing healthy and pathological signals from each of the 30 patients. In addition, with only five features as input, ELM produced the best accuracy, with a specificity of 90.33% and a sensitivity of 89.33%, and it also took less computational time to determine the risk of CVD. Prabhakar et al. [50] considered PPG signals obtained from a single patient. They extracted the statistical features, and the annotation of the PPG signals was conducted by using SVD. The annotated features of the class labels were verified and classified by the GMM and achieved an accuracy of 98.97%. This may be due to the smaller class vector size and overfitting condition for the GMM classifier. In the research work, heuristic-and transformation-based dimensionally reduced PPG data samples of 21 normal and 20 CVD cases were considered. The GMM classifier reached a maximum accuracy of 95.12% with the dragonfly dimensionality reduction technique. Based on patient diagnostic results for coronary heart disease, the classification and prediction models using deep neural networks (DNNs) were created and tested by Miao and Miao [51]. The created DNN learning model consisted of a classification model based on training data, and 303 clinical instances from patients with coronary heart disease at the Cleveland Clinic Foundation were used to create a prediction model for diagnosing new patient cases. The results of the tests indicate that the created classification and prediction model had an 83.67% diagnosis accuracy for heart disease. Hao et al. [52] proposed the softmax regression model, which employs neural networks for training and learning, and calculates the probability that reclassified data will fall into each category. This method classified CVD with an accuracy of 94.44%. Divya et al. [53] proposed a computer-aided diagnostic system that uses PPG signals to determine the different levels of CVD risk. From the PPG signals, statistical characteristics, wavelets, and singular value decomposition features were retrieved. By utilizing the SDC and GMM classifiers, the extracted feature vectors were classified to indicate the various risk levels of CVD, the results show that a classification accuracy of 97.88%, specificity of 99.09%, and a sensitivity of 97.24% were obtained by incorporating the SDC with value decomposition (SVD) and statistical features. In addition, a classification accuracy of 96.64%, specificity of 99.65%, and sensitivity of 93.80% were obtained by incorporating the GMM with SVD and statistical features. Prabhakar et al. [54] used a fuzzy-based approach to optimize the extracted parameters from PPG signals. The statistical features were extracted from the PPG signals, and fuzzy-based modeling was utilized to predict the CVD risk levels from the PPG signals. To optimize the fuzzy model levels, four types of optimization were performed. In order to produce the best results, the optimized values were categorized using the appropriate classifiers, and the support vector machine-radial basis function (SVM-RBF) classifier produced a maximum classification accuracy of 95.05% when the fuzzy model-based levels were optimized with animal migration optimization (AMO). A deep convolutional neural network was developed by Liu et al. [55] to classify multiple rhythms of 23,384 PPG waveforms from 45 patients and achieved an accuracy of 85%. Ihsan et al. [56] studied feature extraction algorithms, such as the respiratory rate (RR) interval, HRV features, and time domain features for detecting coronary heart disease using PPG and achieved an accuracy of 94.4% for HRV features using the decision tree classifier. Al Fahoum et al. [57] extracted the time domain features and health status information from PPG signals and applied feature selection-based classifiers in order to identify the difference between healthy persons and CVD patients. Seven distinct classifiers were utilized to classify the dataset and apply the feature selection. In the first stage, the naïve Bayes classifier achieved the highest accuracy of 94.44%, and in the second stage, an accuracy of 89.37% was attained. Rajaguru et al. [58] extracted the statistical features from the CapnoBase PPG signals of a single CVD patient, and the extracted features were classified with linear regression which produced a better accuracy of 65.85%.
In this research, the harmonic search classifier yielded the best classification accuracy of 98.31% for the HT DR values. The Hilbert transform is a linear operator that causes a 90-degree phase shift in a signal to obtain the desired separation, which is required in the exploration of phase in a harmonic search classifier. The harmonic search classifier is a pitch adjustment of the harmonics; therefore, more such phase exploration is possible to provide better classification. Hilbert transform segregates the signals at the first level itself, which reduces the burden of the classifiers; hence, the harmonic search classifier yielded a better classification accuracy. In order to identify a good classifier, the computational complexity performance measure plays a tradeoff role, as discussed below.

Computational Complexity Analysis of the Classifiers
The computational complexity may also be a performance metric for a classifier. Computational complexity is analyzed by utilizing an input size of m. If the size of the input is O(1), the computational complexity will be very low. There will be an increase in computational complexity if there is an increase in the number of inputs. Computational complexity is denoted as O(log m) when it increases log m times with respect to the increase in m. Table 9 indicates the computational complexity of the classifiers among the various dimensionality reduction techniques. Under the Hilbert transformation DR technique, the logistic regression and firefly classifiers had the lowest computational complexity of O(m log m). The highest computational complexity of O m 7 was reached by the KNN-PAC Bayesian classifier with the ABC-PSO optimization technique. Even though ABC-PSO and DFA had the highest computational complexity of O m 5 , a higher accuracy of 95.58% was exhibited. The higher accuracy of this classifier was due to the DFA's characteristics feature. DFA identifies the peak value of the features, and the ABC-PSO will smooth the features and place them in the labeled classes without any outliers. Table 9. Classifiers' computational complexity among various dimensionality reduction techniques.

Classifiers
Optimization Techniques

Conclusions
This study intended to detect cardiovascular disease (CVD) from PPG signals. The dimensionally reduced features obtained from the PPG signal were stored as datasets. Then, classifiers were used to detect CVD in the patients. The objective was to classify CVD with a high classification rate and a low rate of false positives and false negatives. Even though it is difficult to obtain perfect classification with classifiers, a compromise was made. As a high number of false positives decreases a classifier's accuracy, therefore, a low number of false positives is the most important. The main limitation of this work is that the PAC Bayesian learning and logistic regression classifiers failed to achieve a higher classification accuracy across all five dimensionality reduction techniques, and thus the second-to-second detection of PPG classes will result in more false alarms. At the same time, 30 s segmented epochs of PPG signals were considered for the better classification accuracy of the classifiers. Under this circumstance, the classifiers will be overfitted with the training process and may end in higher accuracy. A compromise is made by taking the segment of a one-minute duration of raw PPG signals to attain a better classification accuracy. The results show that a high classification accuracy of 98.31% was attained when the Hilbert transform optimized values were classified with the harmonic search classifier, and a second highest accuracy of 97.79% was obtained when nonlinear regression optimized values were classified with the harmonic search classifier. A third highest accuracy of 96.095% was obtained when the cuckoo search optimized values were classified with the harmonic search classifier. It was also observed that the harmonic search classifier outperformed across all dimensionality reduction techniques. The convenience and real-time nature of a PPG-based method make it an attractive option for large-scale screening, which has the potential to be helpful in the long-term and real-time monitoring of CVD. PPG-based approaches could potentially be performed remotely without direct patient contact and with minimal patient training by wearable devices, such as fitness bands and smartwatches. As a result, the use of PPG-based methods could play a significant role in detecting CVD at an early stage and continuously measuring risk factors, leading to timely clinical evaluation. The further enhancement of the classifiers' performance will be in the direction of the hyper-parameters' selection through heuristic methods. The future research is toward CNNs and deep neural networks for the detection of CVD, with a minimum time lapse. Because CNNs are good at extracting features from PPG signals and identifying relevant patterns for CVD detection, deep neural networks can identify the most relevant risk factors and develop accurate models for CVD detection; by combining these two types of artificial intelligence, healthcare providers can more accurately diagnose and treat patients with CVD.