Leveraging Regression Analysis to Predict Overlapping Symptoms of Cardiovascular Diseases

In medical informatics, deep learning-based models are being used to predict and diagnose cardiovascular diseases (CVDs). These models can detect clinical signs, recognize phenotypes, and pick treatment methods for complicated illnesses. One approach to predicting CVDs is to collect a large dataset of patient medical records and use it to train a deep learning model. This study investigated CVDs for early prediction using deep learning-based regression analysis on a dataset of 2621 medical records from UAE hospitals, including age, symptoms, and CVD information. We propose a long short-term memory-based deep neural network for early prediction of CVDs by leveraging the regression analysis. It can be seen that the accuracy level of the diseases increased when they were simulated in pairs of one disease with another due to the overlapping symptoms. The study’s results suggest that coronary heart disease has been predicted with an 71.5% accuracy level, with 84.4% overlapping with Dyspnea; when accuracy measured with a combination of three conditions the accuracy was 86.7%, Dyspnea, Chest Pain, and Cyanosis, it has been increased up to 88.9%. Weakness, Fatigue, and Emptysis showed a value of 89.8%. In our proposed work, the combinations were Dyspnea, Chest Pain, Cyanosis, Weakness and Fatigue, Emptysis, and discomfort pressure in the chest have shown the ideal value of accuracy measured up to 90.6%, and with Fever, the accuracy is 91%. We show the effectiveness of our proposed method on several evaluation benchmarks.

on symptoms alone. For example, chest pain or discomfort can be a symptom of both coronary artery disease and heart failure.
Similarly, shortness of breath and fatigue can be symptoms of both heart failure and hypertension (high blood pressure). To accurately diagnose and treat CVDs, it is important to have a thorough medical evaluation, which may include a physical exam, laboratory tests, and imaging studies. In some cases, additional specialized tests may be necessary to confirm a diagnosis and determine the best course of treatment [3].
A heart or blood vessel disease is a cardiovascular disease (arteries and veins). Heart disease has overtaken infectious illnesses as the leading cause of mortality and disability globally for those over 65 years old [4]. It is now considered a second epidemic' in many nations due to the alarmingly high and steadily growing prevalence of the condition. Early identification of cardiovascular illness may help decrease the death rate. Echocardiography is one method of diagnosing cardiac conditions. Echocardiography, or echo, is an unpaintable test to make images of the heart on sound waves. The test provides information on the heart's size and shape and how Effectively the heart chambers or valves function. Echo may also identify cardiac abnormalities in toddlers and babies [5].
The incidents and mortality rates due to CVDs have been increasing in recent years, and how to diagnose and prevent such diseases has become a challenging situation. Acute forms of a cardiovascular catastrophe, such as thrombosis, and atherosclerosis plaque formation, involve 15-20 years. Such long periods of silent symptoms have made it difficult to diagnose early screening healthcare physicians [6]. Most cases, such as ischemic cardiomyopathy and acute coronary syndrome, reach progressive phases of the condition at the first appointment. Moreover, though AS plaques have been discovered, it is a struggle to address whether or not lesions need involvement or weakness in clinical practice [7]. Meanwhile, some of the CVDs may share similar characteristic symptoms like chest pain, irregular heartbeat, shortness of breath, pain in the neck, jaw, throat, upper abdomen, or back, and numbness, which increases the chances of physicians' diagnosis misleading.
Such post-determined symptoms and silenced indications for CVDs can be estimated and assessed using deep-learning neural networks. Deep Belief Networks, Medical Image Segmentation [7], [8], and Bayesian Networks can use the sizeable clinical dataset to predict risk factors' impacts. In this research, the researcher will develop a predictive model based on machine learning algorithms to assist and precisely predict early symptoms of heart diseases in the United Arab Emirates (UAE).
In recent years, deep learning has played a significant role in detecting CVDs [9]. Deep learning algorithms are a type of artificial neural network that can learn and recognize patterns in data, making them well-suited for analyzing large and complex datasets commonly found in medical imaging and diagnostic testing. Deep learning has been instrumental in detecting and diagnosing heart disease from medical images such as echocardiograms, angiograms, and MRI scans. Deep learning algorithms can learn to identify specific patterns and features in these images that may indicate the presence of heart disease, including abnormalities in heart structure, blood flow, and tissue texture [10].
In addition to image analysis, deep learning has also been applied to other types of cardiovascular data, such as electrocardiograms (ECGs) and patient health records. For example, deep learning models have been used to analyze ECGs and predict the risk of arrhythmia and other cardiac events. They have also been used to analyze patient health records and predict the risk of cardiovascular disease based on factors such as age, sex, family history, and lifestyle factors. Overall, the use of deep learning in cardiovascular disease detection and diagnosis is still in its early stages, but the results are promising. As more data becomes available and the technology improves, deep learning is likely to become an increasingly important tool in the fight against heart disease [11].
Deep learning uses artificial neural networks to learn patterns and make predictions from large datasets. It has shown great promise in healthcare, especially in image analysis and natural language processing [12], [13]. One advantage of deep learning is its ability to learn features from raw data, making it useful in cases where manual feature engineering is complicated. Deep learning models can handle large amounts of data and scale well to new tasks, making them well-suited for real-world healthcare applications [14], [15]. One of the methods to predict CVD is using regression analysis using deep learning-based feature extraction from data patterns [16]. In another work [17], the authors propose to use machine learning algorithms for chronic disease outbreak prediction in disease-frequent communities. They also applied the prediction models to hospital data collected from China between 2013-2015. The authors employed latent component model to rebuild missing data from central China hospital medical records. Second, statistical information might identify the region's significant chronic ailments. CNN algorithm selects characteristics for unstructured text. Finally, they provide a CNN-based multimodal disease risk prediction method for structured and unstructured data.
Regression analysis predicts continuous outcomes from predictor variables. Regression models may predict CVD risk using age, blood pressure, cholesterol, and other risk factors [18]. Linear, logistic, and non-linear regression models might be utilized. The data and prediction task determine the model. The regression model should be thoroughly assessed for accuracy and reliability. Divide the dataset into training, validation, and test sets and test the model on the test set. Cross-validation also improves model performance estimates. Regression models may assist in identifying highrisk CVD patients and advise preventive and therapy [19].
This work aims to create a deep learning-based regression analysis framework for early CVD symptom prediction. As the literature has made clear, deep neural networks may construct algorithms that anticipate and extract outputs based on training subjects using large datasets. We concentrate on the influence of overlapping symptoms in CVDs, the use of the UAE clinical dataset as a predictor of risk variables, and the addition of the literature to enhance the diagnostic methods for CVDs. For predicting the overlapping symptoms of CVDs, we use simple linear regression (SLR) and multiple linear regression (MLR) analyses.
The developed deep learning-based framework's results suggest that coronary heart disease has been predicted with an 71.5% accuracy level, with 84.4% overlapping with Dyspnea; when accuracy measured with a combination of three conditions the accuracy was 86.7%, Dyspnea, Chest Pain, and Cyanosis, it has been increased up to 88.9%. Weakness, Fatigue, and Emptysis showed a value of 89.8%. In our proposed work, the combinations were Dyspnea, Chest Pain, Cyanosis, Weakness and Fatigue, Emptysis, and discomfort pressure in the chest have shown the ideal value of accuracy measured up to 90.6%, and with Fever, the accuracy is 91%. We show the effectiveness of our proposed method on several evaluation benchmarks.
The rest of this work is organized as follows. Section II reviews the literature about CVD and deep learning models. The methodological description of this paper is available in Section III. We have presented the experimental evaluation in Section IV. A discussion related to the results of the proposed study is available in Section V. Finally, we conclude this work in Section VI.

A. MEDICAL BACKGROUND
The category of illnesses connected with the heart and blood vessels is cardiovascular disease (CVD). This involves blood vessel restriction that brings blood to the heart, such as coronary heart disease, Cerebrovascular disorder [20] in the head and peripheral arterial disease in the limbs. Second, cardiac and heart valve tissue damage induced by rheumatic fever, such as rheumatic heart attack and heart muscle failure, pumps more blood into the blood vessels.
The rising incidence of CVD places a heavy financial burden on public and private healthcare systems [21]. CVD is a burden on the economy, and the leading cause of preventable death worldwide [22]. A patient experiencing symptoms or a high risk of heart disease may undergo a battery of diagnostic procedures, including an electrocardiogram (ECG) and a stress test, to determine the best course of treatment. The diagnostic and therapeutic costs associated with this approach are quite high due to its high complexity and high demands on time and resources.
Troubles with the normal pace or rhythm of one's heartbeat constitute arrhythmia. It might be an indication of a rapid, sluggish, or irregular heartbeat. Whenever the heart beats quicker than usual, tachycardia is termed. If the heart beats too slowly, bradycardia is termed. Atrium fibrillation is the most frequent form of arrhythmia, which produces irregular and rapid heartbeat [23].
Noale et al. [24] focused their research on the epidemiology of CVDs among individuals aged 65 years and older. They have examined risk factors associated with the most common causes of disability and mortality in this age group, including hypertension, lipids and lipoproteins, smoking, diabetes, physical inactivity, obesity, and risk of acute stroke.
Athanasiou et al. [25] studied the use of CVD risk prediction models. These models utilize various techniques to improve human understanding of the reasoning process behind CVD risk assessment, increase transparency, and foster trust in the results. One such technique is the XGBoost algorithm, which has gained popularity for its computational speed, generalization capabilities, and high predictive accuracy. They also explored how these techniques could be used to understand and improve existing CVD risk calculators, such as Framingham, SCORE, and DECODE. Ultimately, the team proposed an innovative approach to CVD risk prediction, utilizing the XGBoost and SHAP methods to create an explainable model.
Charlton et al. [26] discussed the basics of wearable photoplethysmography (PPG) and its analysis. With the increasing popularity of smart wearables equipped with PPG sensors, there is an opportunity to monitor cardiovascular health in a convenient and non-invasive way. The PPG signal, which measures arterial blood volume, is explored along with various design configurations. The study highlights the potential applications of wearable PPG devices, such as detecting atrial fibrillation, identifying obstructive sleep apnea, monitoring the spread of infectious diseases, and assessing mental stress. Overall, the authors emphasize the exciting opportunity that wearable PPG devices present for daily monitoring of cardiovascular health.
Chieng and Kistler [27] reviewed the impact of habitual coffee and tea consumption on the prevention of CVD. Their findings suggest that moderate coffee/caffeine intake, equivalent to 2-3 cups daily, can positively affect metabolic syndrome. Additionally, coffee consumption has been linked to a decreased risk of coronary heart disease. Results from the study indicate a 9% reduction in the risk of hypertension with daily consumption of 7 cups of coffee. Moreover, coffee and tea consumption may benefit CVDs like arrhythmia, heart failure, stroke, and cardiovascular mortality.
The triglyceride-glucose (TyG) index, which has been suggested as a different biomarker for insulin resistance (IR), was reviewed by Tao et al. [28]. Reduced insulin sensitivity and response, or IR, is a CVD predictor. Combining lipidand glucose-related components makes up the TyG index. According to studies, the TyG index is a trustworthy and practical instrument for determining IR and a predictor of obesity, hypertension, and other conditions. When evaluating IR in people with and without diabetes, the TyG index is superior to the widely used HOMA-IR. The TyG index, according to the authors, may be optimized for risk classification and CVD outcome prediction.
Battineni et al. [29] proposed a study to evaluate the effectiveness of telemedicine systems in providing personalized care for preventing CVD and reducing the need for hospital visits. The use of telemedicine platforms can facilitate regular checkups for CVD patients, and the concept of precision medicine is applicable throughout all stages of CVD. Telemedicine involves securely transmitting clinical information and data using various forms such as voice, text, or images. This technology allows patients and healthcare professionals to have phone or video consultations, providing significant benefits. Telecardiology is the application of telemedicine in cardiovascular medicine. The study highlights the crucial role of telemedicine in delivering personalized care to CVD patients.
Bays et al. [30] published a summary document on the various risk factors associated with CVD. The survey identified ten key risk factors, which include unhealthy dietary habits, physical inactivity, dyslipidemia, hyperglycemia, high blood pressure, obesity, demographics (such as older age, race/ethnicity, and sex differences), thrombosis/smoking, kidney dysfunction, and genetics/familial hypercholesterolemia. It is common for CVD patients to have multiple risk factors, requiring a multifaceted approach to prevention. The document also highlights the importance of physical activity, dietary control, and intermittent fasting in reducing the risk of CVD. Overall, the study emphasizes the significance of identifying and addressing various risk factors associated with CVD to improve prevention strategies.
Dickson et al. [31] conducted a study exploring self-care among older workers with CVD and the impact of workrelated factors. The study is based on the middle-range theory of self-care, which defines self-care as a natural decisionmaking process. Work-related factors, such as job stress resulting from poor work organization, can negatively affect the health of workers with CVD by increasing physiological demands. The study examines several factors that can reduce the risk of CVD in the workplace, including job control, workplace support, work-life balance, and organizational justice. A multi-level intervention is needed to promote sustainable self-care among older workers with CVD within the workplace and mitigate barriers to self-care.
Depression diagnosis also relies on somatic symptoms that coincide with the overlapping symptoms of certain medical conditions. Ellis et al. [32] studied disease signs correlated with depressive symptoms. The author's studies reviewed reported interviews with 46 of 61 qualifying populations residing among older adults with advanced disorders and a significant number of somatic depressive symptoms [33]. Participants replied to an interactive query regarding emotions and legal questions about depressive symptoms. In Hsu et al. [34] studies, The overlapping general populace is very frequent with gastroesophageal reflux (GERD) and dyspepsia, although the relationship between each is little known. 107 participants had ''epigastric pain or burning'' overlaps, and 761 did not develop these symptoms. Subjects of GERD-D showed more serious GERD signs and were more frequently linked to GERD alone (IBS) syndrome [34].
Overlapping cardiovascular problems, including palpitations, tightness in the chest, and shortness of breath that arise in healthy individuals, like stress, suggests doctors and their patients have a tough time assigning their causal or associated function to mental well-being. Primary care clinicians and cardiologists prioritize the management of signs and risk conditions, allowing less opportunity to cope with thoughts and emotions [35].
Though requirements for the formal diagnosis of generalized anxiety disorder are not fulfilled, general concern over everyday activities has been found to encourage and accelerate coronary disease in psychological Pain, including anguish and tension [36]. The impact can also be progressive, indicating that further episodes of anxiety, cold, and fatigue, despite the normal coronary arteries or palpitations or heart beating, may be correlated with a greater incidence of cardiovascular disorders, as well as with the combination of symptoms or indicators of CVDs, such as chest pain act as overlapping symptoms of CVDs [37].

B. MODELING METHODS
Deep learning is a technique for constructing highdimensional predictors in input-output models by reducing high-dimensional data. Specifically, deep learning seeks to select, research, and acquire information from abundant data, creating a concise pattern capable of explaining enigmatic trends or relationships and addressing challenging problems [38].
Yang et al. [39] conducted a study using electronic health records where they leveraged logistic regression analysis for the CVDs. They developed prediction models using machine learning-based models such as random Forest, Ada Boost, Bagged tree, and multivariate regression models.
The work by Swathy and Saruladha [3] explores the potential of AI and Data Mining in predicting CVD. The study compares various models used for CVD prediction, including classification, data mining, machine learning, and deep learning. Authors identified four categories of CVD and proposed a prediction system consisting of data collection, preprocessing, classification, and model evaluation. They also compared datasets with prediction and deployment models, using tools such as WEKA, TANGARA, and MATLAB for cardiovascular disease.
Ghosh et al. [40]  Al-Absi et al. [41] performed case-control research to diagnose CVD using retinal pictures and DXA data. Their deep learning-based solution employed DXA scans and retinal fundus pictures to concentrate on the retinal image center. This non-invasive, fast approach diagnosed CVD with approximately 75% accuracy, according to the research. The dataset was only tested on Qatari people; therefore, its effectiveness in different demographics and locales may vary. A unique DL-based model incorporating retinal pictures and DXA scans to differentiate CVD from control groups achieved 78.3% accuracy.
Kumar et al. [42] examined how machine learning classifiers may predict cardiovascular disease from patient complaints. The study cleaned and analyzed a dataset using SVM, Logistic Regression, and KNN classifiers. The Random Forest classifier method yielded 85.71±% accuracy and 0.8675 ROC AUC. Confusion matrices assessed classifier accuracy.
Abdalrada et al. [43] created and tested a two-stage machine learning model to predict DM and CVD. In the first step, a Multivariate Adaptive Regression Splines (MARS) model using logistic regression (LR) and Evimp functions (EVF) identified significant shared risk variables based on voting criteria. The correlation matrix eliminated duplicate risk variables. DM and CVD prediction models were created using a Classification and Regression (CART) method in the second step. The first step extracted common risk variables from the dataset and utilized machine learning to forecast comorbidities. Clinicians may benefit from the authors' ML model's excellent accuracy, sensitivity, and specificity.
Kee et al. [44] reviewed machine learning-based cardiovascular disease (CVD) prediction models for type 2 diabetics. Scopus and WoS were searched systematically. Logistic regression predicted early CVD in diabetic patients, although machine learning is more flexible and variable. Unlike regression models, machine learning algorithms use many algorithms to get the best match. NN, SVM, DT, and k-NN machine learning algorithms have been used to create prediction models. The suggested models were tested using C-value, sensitivity, accuracy, precision, and area under the curve.
Venkat et al. [45] conducted a study using AI/ML techniques to analyze RNA-seq data from CVD patients. They aimed to identify genes associated with heart failure (HF), atrial fibrillation (AF), and other CVDs, and accurately predict disease using the data. To achieve their research objectives, they developed a new approach called ''Findable, Accessible, Intelligent, and Reproducible.'' They proposed a model that included clinical data analysis and the implementation of AI/ML for predictive analysis. The AI/ML methodology was structured into input, data preparation, methodology, data analysis, outcome, and validation. Wang et al. [46] presented a methodology for highdimensional; pattern regression for medical images using machine learning techniques. In their proposed work, the preprocessed training images are taken and sub-subsampled. After that, feature extraction and selection are applied to the medical images data, and then the relevance vector regression is applied. Their proposed framework is built upon regression using a relevance vector machine strategy. This work is closely related to our approach in this paper.
In another work by Mudassir et al. [47], the authors proposed applying classification and regression problems to bitcoins data. In their proposed framework, a feature selection is applied to the data after collecting and pre-processing the data. Afterward, the models are trained for classification and regression, where predictions on the data are made using trained data.

A. DATA
Due to the complexity of the research and the need to include as many participants as possible, a quantitative approach is used in this situation. In the case of convenience sampling, the likelihood of the variable being chosen from the survey is not understood because there can be no inferential statistics concerning the population. Convenience sampling would be used for our test study, which is the kind of unlikely sample. The total number of models used in the present study is 2621 entries from different UAE hospitals such as Al Gharbia Hospitals, Al Mafraq Hospital, Al Noor Hospital, Al Raha Hospital, and Dar Al Shifaa. A complete list of the hospitals is available in Table 1.
To evaluate the study's objectives, the population for sampling will consist of UAE hospitals, and a dataset Consisting of CVD risk assessment tools in UAE by Oulhaj A [48] is used. Further descriptions of the dataset from where data was collected are given in Table 1. In our research study, the research study population or participants were selected from the UAE Hospitals. The medical record from the hospitals was used as a dataset from the UAE hospitals.
Data description: There are three types of input features: 1) Objective: facts-based information 2) Examination: medical investigation-based results 3) Subjective: patients-oriented information.
Following the clinic, pathological variables will be chosen for sampling the dataset, extracting experimentally on many patient cases. Table 2 shows the variables to be used in conducting this study, categorized into four groups, i.e., Basic Information, Symptoms, Inducement and Medical History, Physical signs and Assistant Examination. Variables selected for the study are given in the following table, which includes the symptoms and their abbreviations used in the software, which are calculated on 3 level scale, -1 equals none, 0.5 equals normal, and 1 equals higher intensity of symptoms.
Descriptive studies assess age, race, and medical history descriptive data, including frequencies, numbers, mediums (M), medium and regular difference (SD). The remaining work included utilizing a prediction model and a multilayer perceptron with a modern Google framework named Tensorflow and Python. We use simple linear regression (SLR) and multiple linear regression (MLR) for the analysis of our study.

B. METHODOLOGY
The architecture of our proposed method is illustrated in Figure 1. We collected the dataset from the UAE hospitals, on which we applied the data preprocessing. A preliminary data analysis is run to get the visual data insights from the datasets. A feature selection is done using principle component analysis (PCA) [49]. Our long short-term memory (LSTM) network is built on deep learning and trained on the UAE's hospital data. Finally, we use regression analysis to forecast the degree of overlap in the intensity of the symptoms.
The proposed framework consists of several modules. Initially, the health data from UAE hospitals is used to train our LSTM-based deep neural network (DNN) [50]. PCA is applied to select the most relevant features from the data, and SLR and MLR are applied for CVD predictions. We also evaluate our model using mean absolute error and compare different modules on CVD data.

1) FEATURE SELECTION
Feature selection using principal component analysis (PCA) [49] is a valuable method for analyzing CVD's data. PCA is a dimensionality reduction technique that identifies essential features in a dataset and reduces the number of features while preserving vital information.
In the context of CVD data, PCA is used to identify the most relevant clinical measurements that are associated with the development of CVD. These measurements include age, gender, blood pressure, cholesterol levels, smoking history, and medical history. To perform PCA-based feature selection on CVD data, we employed the following steps: • Before applying PCA, we preprocessed the data by normalizing the features to have zero mean and unit variance. This step is critical to ensure that all features have equal weight in the PCA analysis.
• Next, we performed PCA on the preprocessed data. PCA identified the principal components and linear combinations of the original features that capture the most significant variations in the data.
• Once we have identified the principal components to retain, we use the loadings of each principal component to determine which original features are most important. Features with high loadings on a given principal component are more important for explaining the variation in that component.
• We selected a subset of features based on their importance as determined by the PCA analysis. This subset VOLUME 11, 2023 can be used for subsequent analysis or modeling of CVD risk. In general, the PCA-based feature selection is used for analyzing CVD data and identifying the most important features for predicting CVD risk. It helped to reduce the number of features and improve the interpretability of the data, which can be particularly useful in clinical settings where simplicity and ease of use are essential. However, ensuring that the selected feature subset is representative of the underlying biology of CVD and that the PCA analysis is carefully validated to ensure its reliability is crucial.

2) LSTM FOR CVD
LSTM RNNs eliminate gradient disappearance in regular RNNs. LSTMs represent sequential data with long-term dependencies well. The input gate, forget gate, cell state, and output gate comprise an LSTM cell. The forget gate removes previous cell state information, whereas the input gate adds fresh information. The cell state acts as a memory for the LSTM, and the output gate controls how much of it is sent to the next time step.
The equations for computing the values of the input gate, forget gate, cell state, and output gate at time step t are as follows: Input gate: Forget gate: Cell state: Output gate: where i t , f t , and o t are the input, forget, and output gates at time step t, respectively. c t is the cell state at time step t. h t is the hidden state at time step t. x t is the input at time step t. W i , W f , W c , and W o are weight matrices for the input, forget, cell, and output gates, respectively. b i , b f , b c , and b o are bias vectors for the input, forget, cell, and output gates. sigmoid is the sigmoid activation function. tanh is the hyperbolic tangent activation function. The LSTM architecture uses a set of equations to compute the values of the input gate, forget gate, cell state, and output gate at each time step. These equations enable the LSTM to learn the temporal dependencies between patients' medical records, such as their age, symptoms, and medical history. The LSTM is trained on a large dataset of patient records using a regression analysis approach, where the input is a sequence of medical records, and the output predicts the likelihood of developing cardiovascular disease (CVD). The LSTM can be trained on various patient attributes and clinical data, such as age, gender, blood pressure, cholesterol levels, smoking history, and medical history, to predict the likelihood of CVD.
We use the trained LSTM to predict new patients' likelihood of developing CVD by taking in the patient's medical records sequence. By leveraging the temporal dependencies in patient health records, LSTM-based deep neural networks can potentially identify patients at high risk of developing CVD and intervene with preventative measures before the disease progresses. However, it is crucial to ensure that the data used to train the model is representative of the patient population and that the model is thoroughly validated before deployment in a clinical setting.

3) SIMPLE LINEAR REGRESSION (SLR)
Simple Linear Regression is a statistical method used to study the relationship between a dependent variable and one independent variable. It is called ''simple'' because it only involves one independent variable. The method is used to predict the value of the dependent variable based on the value of the independent variable.
The equation for a simple linear regression model is: where y is the dependent variable, x is the independent variable, b0 is the y-intercept, and b1 is the slope of the line. The goal of simple linear regression is to find the values of b0 and b1 that best fit the data. SLR models are fitted using least squares. Least squares minimizes the squared disparities between anticipated and actual dependent variable values. ''Regression line'' is the best fit. SLR helps predict and comprehend two variables. It's limited when the variables' relationships are increasingly complicated. Multivariate linear regression may be better in such circumstances.

4) MULTIPLE LINEAR REGRESSION (MLR)
Multiple Linear Regression examines dependent-independent relationships. MLR may assess a patient's symptoms, risk factors, and other data to uncover patterns and associations that can be utilized to predict CVDs. An MLR model may include age, blood pressure, cholesterol, smoking history, and family history of cardiovascular disease to predict heart attack and stroke risk.
The equation for an MLR is as follows.
where y is the dependent variable (e.g. risk of cardiovascular disease), x1, x2, . . . , xn are the independent variables (e.g. age, blood pressure, etc.), and b0, b1, b2, . . . , bn are the coefficients of the model that are estimated from the data. MLR is a powerful tool for predicting CVDs, as it allows for the consideration of multiple variables and their interactions. However, it is important to note that multiple linear regression assumes that the relationship between the independent and dependent variables is linear, and that there is The proposed framework of our study, in which we input the hospital's data and apply the data preprocessing to it. After that, feature selection using the PCA is applied. We then train the LSTM and apply regression analysis on the data to predict CVDs. no multicollinearity among the independent variable and also require large sample size. In cases where the relationship is non-linear or when the sample size is small, other methods such as polynomial regression or decision tree analysis may be more appropriate.

IV. EXPERIMENTAL EVALUATION A. MODEL AND ASSUMPTIONS
First, each of the eight potential predictors that cause data leaks should be seen as a simple linear regression (SLR). After that, the researcher performed multiple linear regressions (MLR) to obtain usability variables predictions. This model will evaluate and identify the usability criteria. In both regression analyses, the SPSS software was used (version 25.0). SLR findings are shown in Table 3. As per the questionnaire, we have evaluated data in the below terms: Chest Pain, Cyanosis, Emptysis, Fever, Fatigue, Discomfort, Pressure in the Chest, Weakness, and how users rated them in terms of Heart issues. These are abbreviated CHEP as Chest Pain, dyspepsia as DYSP, Fever defined as FEVE, Headache as HEADACHE, Cyanosis defined as CYAN, Weakness and Fatigue as WFAT, fulfillment as DPCH, and the definition of success are CHEP.
Following is the formulation of SLR: Disease prediction is that Chest Pain has been predicted with an 84.6% accuracy level.

MLR Analysis of Symptoms:
The dataset included 2621 regression instances in which the model was a predictor (eight different usability factors). Shortening the gap between predictor responses is the driving force behind the multi-linear method used in stepping regression.
This method incorporates the dependent variable with various independent variables, such as two factors IV, three factors IV, four factors IV, and so on, to create a more complex model. Combining all 0.000 statistically significant predictors, the R 2 value is computed. The tables below in the next section provide the MLR results for different combinations of predictors. A better model has a higher R 2 value.
After applying MLR, the researcher got findings for all combinations of predictor variables. A secure approach where data breaches are impossible has been proposed as a result of this. Increase the number of variables gradually until the optimal model is obtained. Table 4 lists the most effective regression models that produced significant findings.
In the MLR model with seven predictors, the larger value of the R-Square is 0.911, as shown at Sr. No. 7. The R-Square, on the other hand, shows just a 0.001 rise when the eight variables are included in the regression model, but the R-Square shows a higher change in prior models. As a result, the final model has seven variables excluding headaches.
R-Square: The percentage of the dependent variable's variation that can be predicted from the independent variable is expressed statistically by the R-square (R 2 ). It serves as a gauge for how strongly the two variables are related. Higher numbers suggest a stronger association. R-squared values vary from 0 to 1. R-squared is the square of the Pearson correlation coefficient between the dataset's observed responses and the model-predicted responses in a linear regression model. It may be used to assess the model's quality of fit. Following is the formulation of R 2 . (9) where V N n=1ŷ n is the residual variance and y is the outcome.

B. COMPARISON
Using elements refined by Delphi methodologies, an optimal hybrid model with eight usability aspects was determined in this study, and they are as follows: The independent variables/factors in the hybrid model that influenced the dependent variable's rating regressively were empysis, learnability, fever, cyanosis, weakness or fatigue, chest discomfort pressure, and chest pain. The next section compares our suggested RUF (Rating of Usability Factor) to the HYBRID model. An eight-factor Hybrid Model Rating, a RUF Model, and the MLR and SLR techniques are used to finish the RUF Model. Users' ratings of the hybrid model were shown to be non-significantly predicted by the model's R 2 value, indicating that it is responsible for 71.5% of the variation in user ratings R. Following is the regression formulation: The predicted value of user rating (dependent variable) is denoted by Y, whereas the rating provided by the hybrid model is represented by OR (independent variable). Regarding forecasting user rating apps' usability characteristics, the RUF Model (based on 7 elements) performs better than other models, according to the coefficient of determination. According to the seven usability criteria, RUF explained 91% of the variance (R 2 = 0.910), whereas Hybrid explained 91% of the variance (R 2 = 0.911).

C. MODEL ASSESSMENT AND VALIDATION
In addition to summarizing k-fold models with R-square values, this part gives a thorough explanation of how to evaluate model values using K-fold cross-validation using PRED(X) and MMRE.

1) MODEL ASSESSMENT
To attain the remarkable precision of the forecast model, de facto average prediction accuracy criteria are employed. MMRE and PRED(x) are frequently employed for accuracy verification. in the written word. PRED value is more than or equal to 0.75 (0.25). When the MMRE value exceeds 0.25, it will be discarded and treated as PRED (X). The MMRE value was discovered to be 0.0515.

2) VALIDATION METHOD
The method of model validation known as K-fold crossvalidation is not comprehensive. The effectiveness of the projected model on an independent data set is used in this method. K-fold cross-validation is a technique for assessing how well a machine learning model performs on a dataset.  The dataset is randomly divided into ''k'' folds or subsets, and the model is trained and evaluated k times, each time. The model's effectiveness on the dataset is then estimated using the average performance over all k iterations. K-fold cross-validation might be used to assess how well a machine learning model-like a regression model-performed on a dataset of patient medical records when it came to forecasting CVDs. It is feasible to obtain a more reliable assessment of the model's performance and prevent overfitting to the training set by utilizing k-fold cross-validation. Choosing an acceptable value for k is crucial when employing k-fold cross-validation. K=10 is a popular choice, however depending on the situation, K=5 or K=20 may also be utilized.
Depending on the quantity of the data, the dataset is divided into the same number of folds. By k-fold cross-validation, data calibration and validation double their data points. All folds' data values are split into 10 predetermined categories. Each category has a total of 21 times throughout each fold, ranging from 1 to 21 and 22 to 42. K = 1 to K = 10 for fold.
Each fold of the data is shown to be significant even after being divided into eight equal parts, demonstrating the accuracy and suitability of our dataset. We found that every single one of its values in the K fold forecast are significant values. If the value after training is 0.05, we observed that all six predictors are substantial. PRED (0.25) is computed in each fold, it should be noticed. Table 5 contains a validation summary and each fold's R 2 MMRE values. All MMRE and R 2 values have validated the suggested model. Figure 2 shows the results of a final prediction made using the data and R 2 .

D. HYBRID MODEL
When the MLR is applied to all eight components, the R-Square only rises by 0.1 percent, but the but condition is no longer crucial. The sixth model is the greatest choice since the half-breed design does not provide a practical deep-learning procedure to safeguard a patient from heart failure.
We discover and demonstrate that all of the K-FOLDs are 0-21 by using MRE to the remaining 21-147 K-folds with Constants of 0-21 and finishing the comparable with irregular data. In this way, we were able to isolate the data. Just one of the 336 MRE esteems had a Pred(X) of 0.25 or less, yet MMREs fit every one of them, demonstrating the accuracy of our data. Our data show that our ordered mean relative error (MMRE) values are 0.1523, 0.0705, 0.0486, 0.0471, 0.0509, 0.0451, 0.0408 and 0.0558; some values are less than 0.05; we used grand MRRE and discovered 0.05, indicating that our dataset is free of mean relative error. Moreover, MRE values greater than 0.25 are regarded as Pred(X) values indicating that the Rating of Usability Factor (RUF) model is correct and ideal.

E. MODEL ACCURACY
We discuss the accuracy measure metric we consider for CVD prediction. We also present results without and with timeseries approaches. In the end, we describe the quantization technique results in terms of accuracy and size reduction for CVD data of UAE hospitals.
Our framework's prediction accuracy is measured by Mean Absolute Error (MAE). MAE measures the average absolute difference between expected and actual values. MAE ignores outliers since it doesn't punish huge mistakes. Lower values improve model performance. The following equation calculates MAE. 11.
We evaluate our model based on the aforementioned metric for the prediction of overlapping symptoms on unseen test data. We extracted most latest samples from the dataset to make test data. The test data was not used during the training and validation of the model. Figure 3 shows the MAE values for comparing single-input and multiple-input-time models following the time-series approach. The single-input-time dense model showed better accuracy than the single-input-time linear model, it can be because of additional layers and neurons. However, multiinput-time dense and convolutional-based model results were less accurate than single-input-time models. On the other hand, the LSTM model exhibited the highest accuracy compared to the above. This can be attributed to the LSTM memory capability of retaining information. Figure 4 compares single-shot models that predicted multiple time steps. The accuracy gradually increases, going from linear to more advanced models. As compared to singleoutput models, the accuracy has moderately decreased. This is as expected since the number of time steps ahead in the predictions has increased. Table 6 shows a performance comparison regarding MSE on test data. It can be observed that in this particular problem,   quantization has almost no effect on accuracy in terms of MSE.

V. DISCUSSION
We provide a brief analysis of our work related to CVD carried out from deep learning-based regression techniques and which have been rated by early model prediction as above in terms of overlapping and disease prediction with accuracy level. According to the questionnaire provided against our factors/predictors. most of the individual users of these applications in their regular daily existences have made many VOLUME 11, 2023 mistakes from a security point of view. The specialist surveyed how data could be secured from breaching. The Hybrid ideal model was used with SLR representing up to 8 elements: (i) CHEP (ii) DPCH (iii) DYSP (iv) FEVE (v) Headache (vi) EMPT (vii) WFAT (viii) CYAN. All of the traits were determined to be significant and to have strong R-squared values. In the mixed model, EMPT accounts for 64 percent of the change, while DYSP accounts for 69.9 percent, FEVE for 14%, HEADACHE for 2%, CYAN for 40%, WFAT for 5%, DPCH for 29%, and CHEP for 71.5% protect our sensitive characteristics from overlapping symptoms. Deep learning users who had a heart attack or heart failure determined that the most effective overlap symptoms factors are EMPT, DDYSP, CYAN, and CHEP. The scientist then made the decision to build this using multiple linear regression by making sets of these components dependent on every feasible configuration of a secure modeling model. The formation of twofold pairings resulted in the discovery of 21 out of 29 sets as important, with R-Square values in the good range and sig 0.000 R-Squares. Once again, DYSP and CHEP have the greatest R-Squares (0.844), which account for 84.4 percent of the difference in results and show which aspects are most crucial for user safety.
Crossover uses MLR. DYSP, CHEP, and CYAN explain 86.7% of model variance. CYAN increased the model's R Square. Four and five regressions utilized MLR. R-Square is 0.889 in the fourth matched relapse, expanding 88.9% of the equation. The fifth R-Square is 0.898 and explains 0.9 percent more than the fourth matched relapse. WFAT and Productive improved model explainability.
Furthermore, it has been shown that the number of essential blends decreases when the matching approach is enlarged. For example, one set of EMPT, DYSP, FEVE, CYAN, WFAT, DPCH, and CHEP is essential in the seventh combined relapse. As a result, the model's R-Square is enlarged to 0.910, making it more straightforward than the previous one.
In conclusion, regression analysis is a powerful tool that can be leveraged to predict overlapping symptoms of CVDs. By analyzing data on patient symptoms, risk factors, and disease outcomes, regression models can identify patterns and relationships that can be used to accurately predict which patients are at risk for specific CVDs. This can help doctors and healthcare providers to more quickly and effectively diagnose and treat these conditions, ultimately improving patient outcomes and reducing the burden on the healthcare system.

VI. CONCLUSION
With more than 8000 fatalities annually, CVDs are a significant cause of mortality and disability in the United Arab Emirates. Since CVD may exhibit a wide variety of symptoms, it can be challenging to get a precise diagnosis. Researchers have created a prediction model that uses regression analysis to foretell overlapping symptoms of CVDs to solve this problem.
To identify individuals at risk of CVDs, our suggested model combines self-reported symptoms, lifestyle variables, and test data. Regression analysis is used to find trends in the data that may be utilized to forecast the risk that a patient would develop CVD. Additionally, the model may be used to evaluate the relative hazards connected to various lifestyle choices, such as smoking and obesity.
The proposed model in this work is reliable in predicting the development of CVDs in patients with comparable symptoms after being validated using a large dataset of patients with known CVDs. The model may also be used to identify individuals who need closer monitoring since they have a greater risk of acquiring CVDs. Our suggested prediction model is a crucial tool for bettering the diagnosis and treatment of CVDs and can potentially lower mortality and morbidity associated with CVDs.  She was as a Lecturer with the College of Pharmacy, Government College University Faisalabad (2009-2013). After completing her Ph.D. degree, she started her career as a Researcher with the College of Pharmacy, Gachon University, South Korea, where she is currently an Assistant Professor. She is an expert in the field of natural product chemistry, particularly in the discovery of bioactive constituents from medicinal plants against pharmacological conditions. She has published several research papers in different national and international journals. Her research interests include isolating and identifying new constituents from traditionally used medicinal plants that may have potential against various pharmacological conditions, such as cancer, diabetes, aging, and neurodegenerative diseases. She also has an ongoing interest in identifying bioactive constituents from natural resources for aging and life span extension.