Machine Learning to Estimate Workload and Balance Resources with Live Migration and VM Placement

: Currently, utilizing virtualization technology in data centers often imposes an increasing burden on the host machine (HM), leading to a decline in VM performance. To address this issue, live virtual migration (LVM) is employed to alleviate the load on the VM. This study introduces a hybrid machine learning model designed to estimate the direct migration of pre-copied migration virtual machines within the data center. The proposed model integrates Markov Decision Process (MDP), genetic algorithm (GA), and random forest (RF) algorithms to forecast the prioritized movement of virtual machines and identify the optimal host machine target. The hybrid models achieve a 99% accuracy rate with quicker training times compared to the previous studies that utilized K-nearest neighbor, decision tree classification, support vector machines, logistic regression, and neural networks. The authors recommend further exploration of a deep learning approach (DL) to address other data center performance issues. This paper outlines promising strategies for enhancing virtual machine migration in data centers. The hybrid models demonstrate high accuracy and faster training times than previous research, indicating the potential for optimizing virtual machine placement and minimizing downtime. The authors emphasize the significance of considering data center performance and propose further investigation. Moreover, it would be beneficial to delve into the practical implementation and dissemination of the proposed model in real-world data centers.


Introduction
The development of virtualization technology in data centers (DCs) in Indonesia has been substantial in recent years, coinciding with the evolution of application-based business models [1,2].Virtualization technology is utilized by various virtual service providers worldwide, such as Azure Virtual Machine (VM), Windows VM, and Linux VM.It contributes to 40% of the data center (DC) usage, with a split of 45% for public and private hybrid usage, and only 15% for virtual machines (VMs) in public clouds.This underscores the importance of using virtual machines in industry and business [3].Virtualization technology is widely applied in data centers as a server resource capable of storing business information files and interconnecting servers [4,5].Virtualization technology has significantly impacted data center administrators, especially in the efficient utilization of a large number of hardware servers in data centers [6].
Virtual machine (VM) technology is one example of server virtualization.Some issues encountered on virtual machines include overloading, the full availability of the host machine (HM), overloading, and the availability of the HM [7].These problems are common in data centers (DCs), as performance must be consistently optimal to support business activities and other applications.Therefore, addressing these complex issues Informatics 2024, 11, 50 2 of 15 requires an appropriate approach.An indispensable strategy in this context is live virtual migration (LVM), one of the goals of which is to balance the workload on virtual machines and address other related issues [8].
LVM is one solution for maintaining optimal data center performance without disrupting the processes running on virtual machine applications.The purpose of LVM is to facilitate the movement of virtual machines from one host machine (HM) to another without affecting the display of the running application on the virtual machine [9].The objectives include improving the integrity of the overall system, managing resources, and enabling maintenance without downtime [10].In the LVM process, it is ideal to monitor resource conditions in the data center and the workload patterns of each virtual machine (VM) accurately.This helps in determining the target host machine (HM) to ensure a successful migration of the VM to the HM without any VM downtime [11].Several previous studies have been conducted to find optimal solutions to this problem.One approach involved optimizing LVM to minimize downtime in the data center, migration in the network, or other issues that could degrade virtual machine (VM) performance.In this investigation, the workload primarily utilized VM memory, and the method employed was the Optimized Pre-copy Algorithm (OPCA), which concentrated on network content during VM migration [12].Another study also utilized OPCA, but it focused solely on the virtual machine memory workload.Additional research by Zhang et al. suggested that LVM aimed to redistribute resources between host machines (HMs) to balance workloads in data centers.This was accomplished by employing a genetic algorithm (GA) to forecast high-productivity host machines based on VM workloads [13].However, performing a live migration (LVM) without considering the appropriate timing may result in an imbalanced host machine (HM) workload.This statement aligns with Satpathy et al., who emphasized scheduling virtual machine virtual (VM) migration to prevent VMs from having excessive workloads [14].Moreover, machine learning is considered a pivotal algorithm capable of adapting learning systems to recognize data patterns as knowledge [15].A study by Kumar et al. in 2022 suggests that machine learning algorithm approaches can simplify the human learning process for determining live virtual machine migration [16].In 2023, Haris et al. stated that utilizing machine learning with the K-nearest neighbor (KNN) algorithm for LVM achieved a success rate of 95% with improved pre-copy migration [17].The main contributions of the research are as follows: • The authors propose a novel hybrid machine learning model that combines the Markov Decision Process (MDP), genetic algorithm (GA), and random forest (RF) algorithms to estimate the pre-copy live migration of virtual machines (VMs) in a data center.• The proposed model can predict which virtual machines (VMs) will be moved first and determine the optimal host machine target to optimize VM placement and minimize downtime.
• The authors compare the proposed model with the existing state-of-the-art machine learning algorithms, such as K-nearest neighbor, decision tree classification, support vector machines, logistic regression, and neural networks.The results show that the proposed model achieves 99% accuracy with faster training times than previous studies.• The authors suggest further research using a deep learning approach (DL) to address other issues related to data center performance.
Overall, the article presents a promising strategy for enhancing virtual machine migration in data centers.The hybrid model showcases higher accuracy and quicker training times compared to prior studies, suggesting its potential to optimize virtual machine placement and minimize downtime.The authors also emphasize the significance of evaluating data center performance and propose further research.Moreover, it would be beneficial to have more details on the practical implementation and deployment of the proposed model in real-world data centers.
In this research, the authors attempted to enhance the machine learning model by utilizing the Markov Decision Process (MDP).This is often one of the types of algorithms frequently used in machine learning (ML) to generate optimal policies.Another application of MDP can be seen in the study conducted by Guo et al. in 2020 [18].The study concludes that the Markov Decision Process (MDP) can streamline path problems in the Hybrid Flow Shop Scheduling Problem (HFSSP) domain, reducing their complexity and showcasing significant potential in the field.It also conserves computational resources for efficient optimization.Based on various research findings, MDP emerges as a viable option for delivering optimal solutions in intricate decision-making scenarios.To further enhance the MDP approach, additional algorithms are required to augment the decision-making process, particularly in the live virtual migration (LVM) process.This was highlighted in the research conducted by Ajmal et al. in 2021 [19].It is suggested that to reduce the workload on a virtual machine (VM), the implementation of a genetic algorithm (GA) is recommended.GA is shown to reduce the unbalanced workload in the VM by assigning tasks to groups, detecting the workload of the overloaded VM, and scheduling to manage the VM.Therefore, based on the research organized by Haris et al. in 2023, this study will aim to improve LVM consolidation through a hybrid machine learning approach based on MDP, GA, and random forest (RF).This is not only intended to increase accuracy but also to optimize decision making in the LVM process, as well as to address issues in VM migration scheduling.In addition to improving the accuracy of the learning machine in live migration technology decision-making processes, our research goal is to develop rules for LVM that focus on enhancing the accuracy of the machine learning model and balancing DC workloads.To achieve this, we consider the balance of the VM workload as a key factor.The balance of the VM workloads is determined by CPU and memory, as well as by prioritizing LVM scheduling for migration.
The content of this study requires a systematic analysis, which we will illustrate below.The second section highlights several related works on data preparation (DP).The third section focuses on the research method and data collection.The fourth section presents the results and discusses them.The fifth section outlines the research conclusion, limitations, and recommendations for further study.

Related Works
In this section, we outline the theories related to the research we conducted and the research opportunities we will pursue.The live migration model, aimed at maintaining the performance of virtualization technology known as auto-scale, is an optimization strategy to adjust resource capacity in cloud computing based on workload demand fluctuations.This approach enables applications to stay responsive and available during traffic spikes or load reductions [20].On the other hand, live virtual migration (LVM) is the technique utilized in virtualization in data centers for moving VMs from one host machine (HM) to another without disrupting the running applications.This allows for maintenance, resource optimization, and high availability in virtualized environments.Both concepts play a significant role in maintaining the performance and availability of current computing systems [21].In this investigation, the authors focus exclusively on the technology of live virtual machine migration in data centers [22].For example, in a single LVM process, only one VM is involved at a time, while the migration of a VM group allows the movement of a group of interdependent VMs together [23,24].This approach can enhance the consistency and coherence of applications during migration.Furthermore, migration can also be classified according to the state of the virtual machine workload, including memory, CPU, disk, and other settings, from the source host machine to the destination host machine [3].Each classification in VM migration has different implications depending on the specific needs and environments [25,26].Additionally, workload characteristics and VM management policies must be considered when determining the appropriate placement.Other factors, such as network traffic, storage requirements, and application-based separation policies, can also influence the placement of virtual machines.
In this regard, the impact on the network should be considered, including the location of the virtual machine from the network nodes and the usage to optimize latency and network performance [7,27].Security and compliance aspects are also crucial considerations in VM placement, requiring security policies and compliance requirements to be executed in the proper VM placement.By considering these factors, the right placement algorithm can be used for live virtual machine migration, thus achieving optimal migration success.The authors identify several challenges in VM migration that need to be considered.During the migration of virtual machines, the primary goal is to complete the migration on time and place the virtual machine on the appropriate host.To achieve this, an effective model approach is needed to calculate VM migration that is based on the service level agreement (SLA) for CPU and RAM by determining the correct VM placement to improve VM performance [28,29].

Previous Studies on Live Migration Technology
The authors identified a significant gap in LVM improvement, as studied by Haris et al. (2023) in machine learning.We opted to enhance machine learning modeling with a hybrid approach, thereby optimizing the LVM approach using the Markov Decision Process (MDP) and genetic algorithm (GA) in decision making and LVM scheduling selection.This is consistent with the research conducted by Guo et al. in 2020, which suggests that the Markov Decision Process (MDP) is a commonly utilized type of natural language processing (NLP) in reinforcement learning to determine the best policy [18].However, the upcoming research aims to go further by developing hybrid machine learning to determine VM workloads with higher loads and presenting detailed information on LVM preparation.It involves selecting which VM instances should be migrated and when the optimal time for LVM execution is guided by optimal HM objectives.In this study, the MDP algorithm aligns with the research conducted by Alqarni et al. (2023), where problem-solving involves the reallocation of resources [30].This hybrid ML can significantly impact the prediction of overloaded VM issues and selection of scheduling processes in the LVM.In Table 1, we explore previous research conducted on LVM processes.

Authors
Research Methods Workload Prediction Targets Key Findings [12] The pre-copy method CPU Memory Content migration There were no selection techniques for VM migration based on workloads. [30] Re-initialization and decomposition-whale optimization algorithm

CPU and memory Resource utilization
Only energy usage and migration costs; no specific virtual machine workloads were used.
[31] Support vector regression Memory, bandwidth, and CPU Host utilization It was limited to evaluating host usage; it was not suitable for evaluating the performance of VM migrations, although its precision was higher than that of the other models. [32] Machine learning-based downtime optimization (MLDOM)

CPU, memory, and network utilization Downtime
This process conducted to reduce the downtime only applied to VM migration over the network environment.
[33] Artificial neural network A large number of factors Bandwidth usage and CPU Utilization Performance prediction for pre-copy migration was not affected by increasing data center efficiency.
[34] Genetic algorithm CPU, memory, and network CPU utilization and bandwidth Fast VM placement was implemented.
[35] Forecasting technique Disk, memory, and CPU Disk, memory, and CPU Workload prediction was obtained to minimize VM migration.
[36] Machine learning CPU CPU utilization The K-nearest neighbor (KNN) is the approach and method in the decision of tree classification algorithms, which were compared to determine the number of VMs migrating with an accuracy level of 90.9%.
The Markov Decision Process (MDP) is an algorithm for decision-making processes and can represent an environment.Some prediction problems can be formulated using MDP equations.MDP is the traditional formalization of sequential decision making in which decisions have an impact on the next stage through future delayed rewards in addition to current benefits [37,38].Thus, MDP is associated with delayed rewards and the need to modify these rewards over time [39].
In this investigation, the authors used the Markov Decision Process (MDP) to analyze the support for learning between the specialist and the environment.Within the multiagent system setting, the MDP can be represented as a five-tuple M = (S, A, R, T, γ), where S is the state space, A is the action space, R : The agent should select the action based on its policy π θ : SA and reward outcome r.Subsequently, the environment will change for the next argument of T. In a multi-agent environment, it can be seen as O = 0 1 × . . .× 0 N , a combination of local observations for N agents, and A = A 1 × . . .× A N , a set of actions [40].Meanwhile, the reward and transition functions are subject to change The authors aim to maximum the reward in the policy, which is expected to be expressed in the equation According to scholars, the genetic algorithm has become a search technique that can solve estimates for optimization and search problems.This algorithm utilizes techniques such as derivatives, mutations, and crossovers [42].Genetic algorithms (AG) may offer solutions to strike a balance between accuracy and efficiency trade-offs.The authors aim to address scheduling in virtual machine placement using a genetic algorithm approach [43].Initially, the authors propose an arrangement that will be denoted as a decision variable d with first-fit and best-fit algorithms.These arrangements are considered the initial population.Most issues with job scheduling are known to be NP-complete [44].During the selection process, the solution with the best fitness will be chosen, while the others will be discarded [45,46].
Within the selection process, the first step is to calculate the choice likelihood for each chromosome.The likelihood of determination of the chromosome is calculated using Equation (1).
The probability of selection is calculated and then divided into segments, the width of which is proportional to the selection probability.The final step involves generating a random number and selecting a chromosome from the segment that contains the random number [47].In this genetic algorithm, crossover is used to produce the new offspring of the selected individual in the selection sequence.Various crossover methods are discussed in the literature review.In the studies using two-point crossover, the chromosome is initially divided into two segments [48].The location of the division on the chromosome is randomly selected and is represented in Equation ( 2) as follows: where C 1 and C 2 are two intersection points, and C1, C2 = rand(2) × (TK − 1) + 1, creates a random number between one and the entire length of the chromosome.Subsequently, the two chromosomes between these cuts have their values changed, resulting in two new chromosomes.The crossover details are illustrated in Figure 1 [49].
where  and  are two intersection points, and 1, 2 = (2) × ( − 1) + 1, creates a random number between one and the entire length of the chromosome.Subsequently, the two chromosomes between these cuts have their values changed, resulting in two new chromosomes.The crossover details are illustrated in Figure 1 [49].
Mutation operations introduce new features on chromosomes, namely exchange mutations, uniform mutations, and percussion mutations.In this strategy, the arrangement is arbitrarily chosen, and its quality value is replaced by the value of another quality arrangement [50].To confirm and demonstrate that each arrangement has an equal chance to induce new features, the starting and ending positions are first selected, and then the genes at those positions are swapped.In the next iteration, the position from the beginning is added and subtracted from the last position and then the gene value is exchanged at this position, and so on.An illustration can be seen in Figure 2 below [51].The random forest (RF) algorithm is a popular learning algorithm used in regression and classification scenarios.This algorithm combines multiple decision trees (DTs) to generate predictions, resulting in more accurate outcomes.RF can be employed to assess feature importance and aid in feature selection [52].However, due to the need to create multiple decision trees, the training phase of RF can be computationally expensive.In DTbased algorithms like RF [53,54], the Gini impurity metric assesses impurity in a set of instances based on their class labels.The goal is to identify features that minimize the Gini index, indicating higher homogeneity among class labels within the instance [55].The Gini impurity metric can be mathematically calculated using Equation (3).Mutation operations introduce new features on chromosomes, namely exchange mutations, uniform mutations, and percussion mutations.In this strategy, the arrangement is arbitrarily chosen, and its quality value is replaced by the value of another quality arrangement [50].To confirm and demonstrate that each arrangement has an equal chance to induce new features, the starting and ending positions are first selected, and then the genes at those positions are swapped.In the next iteration, the position from the beginning is added and subtracted from the last position and then the gene value is exchanged at this position, and so on.An illustration can be seen in Figure 2 below [51].
where  and  are two intersection points, and 1, 2 = (2) × ( − 1) + 1, creates a random number between one and the entire length of the chromosome.Subsequently, the two chromosomes between these cuts have their values changed, resulting in two new chromosomes.The crossover details are illustrated in Figure 1 [49].Mutation operations introduce new features on chromosomes, namely exchange mutations, uniform mutations, and percussion mutations.In this strategy, the arrangement is arbitrarily chosen, and its quality value is replaced by the value of another quality arrangement [50].To confirm and demonstrate that each arrangement has an equal chance to induce new features, the starting and ending positions are first selected, and then the genes at those positions are swapped.In the next iteration, the position from the beginning is added and subtracted from the last position and then the gene value is exchanged at this position, and so on.An illustration can be seen in Figure 2 below [51].The random forest (RF) algorithm is a popular learning algorithm used in regression and classification scenarios.This algorithm combines multiple decision trees (DTs) to generate predictions, resulting in more accurate outcomes.RF can be employed to assess feature importance and aid in feature selection [52].However, due to the need to create multiple decision trees, the training phase of RF can be computationally expensive.In DTbased algorithms like RF [53,54], the Gini impurity metric assesses impurity in a set of instances based on their class labels.The goal is to identify features that minimize the Gini index, indicating higher homogeneity among class labels within the instance [55].The Gini impurity metric can be mathematically calculated using Equation (3).The random forest (RF) algorithm is a popular learning algorithm used in regression and classification scenarios.This algorithm combines multiple decision trees (DTs) to generate predictions, resulting in more accurate outcomes.RF can be employed to assess feature importance and aid in feature selection [52].However, due to the need to create multiple decision trees, the training phase of RF can be computationally expensive.In DT-based algorithms like RF [53,54], the Gini impurity metric assesses impurity in a set of instances based on their class labels.The goal is to identify features that minimize the Gini index, indicating higher homogeneity among class labels within the instance [55].The Gini impurity metric can be mathematically calculated using Equation (3). (3)

Methods
In this section, we will outline the objectives of our research and then define the problem, as shown in Figure 3 below.

Methods
In this section, we will outline the objectives of our research and then define the problem, as shown in Figure 3 below.This research aimed to analyze the development of a hybrid machine learning model for VM migration utilizing MDP and GA to enhance the accuracy of the previous machine learning model and improve optimization in the efficient distribution of workloads in data centers (DCs).This allows it to operate optimally in its functions.The effectiveness of our approach was evaluated using two key performance indicators to explain the method used and further enhance the LVM method.Deciding whether to relocate the VM to the target HM is a crucial step in supporting LVM. Figure 3 illustrates the stages of the development of the hybrid machine learning model.We gathered the DC workload data related to CPU, memory, and network and assessed the workload on each VM based on the CPU and memory.These data were then processed using the MDP method, and the expected outcomes were used to evaluate the VM workload on each HM, serving as a reference in the LVM process.Subsequently, to determine the suitable HM, we applied the GA and random forest approaches to the existing virtual machine workload.To calculate the migration value, we introduced the network variable to identify virtual machines with high network traffic and transfer them to the destination HM using the GA, ensuring an optimal migration process.Following the genetic algorithm processing, we incorporated a random forest algorithm to identify the appropriate HM in the LVM procedures.In the GA and RF processes, to determine the VM placement, we mapped the VMs that would be migrated after obtaining the results.Figure 4 provides an example to clarify the definition of the VM coverage and its relation to the HM.approach was evaluated using two key performance indicators to explain the method used and further enhance the LVM method.Deciding whether to relocate the VM to the target HM is a crucial step in supporting LVM. Figure 3 illustrates the stages of the development of the hybrid machine learning model.We gathered the DC workload data related to CPU, memory, and network and assessed the workload on each VM based on the CPU and memory.These data were then processed using the MDP method, and the expected outcomes were used to evaluate the VM workload on each HM, serving as a reference in the LVM process.Subsequently, to determine the suitable HM, we applied the GA and random forest approaches to the existing virtual machine workload.To calculate the migration value, we introduced the network variable to identify virtual machines with high network traffic and transfer them to the destination HM using the GA, ensuring an optimal migration process.Following the genetic algorithm processing, we incorporated a random forest algorithm to identify the appropriate HM in the LVM procedures.In the GA and RF processes, to determine the VM placement, we mapped the VMs that would be migrated after obtaining the results.Figure 4 provides an example to clarify the definition of the VM coverage and its relation to the HM.

Evaluation Model for Machine Learning
In this study, the performance was evaluated using the confusion matrix, accuracy, and the F1 score.Each of these metrics will be discussed in the following section.
Calculation execution can be assessed using a specific table structure known as a Karl-Pearl confusion matrix, or alternatively, an error matrix.In unsupervised learning, this method is referred to as the matching lattice approach; in the supervised approach, it is commonly known as a confusion matrix [56,57].Each row in the matrix represents actual occurrences in the class, while each column represents instances in the predicted class.The confusion matrix can be depicted as false positives (FPs), false negatives (FNs), true positives (TPs), and true negatives (TNs), TP + TN + FP + FN represents the total number of instances.Table 2 explains the concept of a confusion matrix.

Evaluation Model for Machine Learning
In this study, the performance was evaluated using the confusion matrix, accuracy, and the F1 score.Each of these metrics will be discussed in the following section.
Calculation execution can be assessed using a specific table structure known as a Karl-Pearl confusion matrix, or alternatively, an error matrix.In unsupervised learning, this method is referred to as the matching lattice approach; in the supervised approach, it is commonly known as a confusion matrix [56,57].Each row in the matrix represents actual occurrences in the class, while each column represents instances in the predicted class.The confusion matrix can be depicted as false positives (FPs), false negatives (FNs), true positives (TPs), and true negatives (TNs), TP + TN + FP + FN represents the total number of instances.Table 2 explains the concept of a confusion matrix.

Predicted Positive Predicted Negative
True positives (TPs) False negatives (FNs) False positives (FPs) True negatives (TNs) Where true positives (TPs) are the positive instances that have been correctly classified by the classifier; TP represents the count of the true positives.True negatives (TNs) are the negative instances that are correctly classified by the classifier; the total count of the unique negatives is TN.False positives (FPs) are the negative tuples that are erroneously classified as positive; the number of the false positives, FP, will be utilized.False negatives (FNs) are the positive tuples that are erroneously classified as negative.The number of the false negatives, FN, will be utilized.
Table 3 shows the results of the comparison among the various machine learning (ML) procedures used in Latent Variable Modeling (LVM).The models underwent training and testing on the same dataset, and the duration of each phase was recorded.The precision of each model during both the testing and training phases was also provided.It is evident from Table 3 that the random forest (RF) model exhibits quicker training and testing times.Table 3 clarifies that this demonstrates a preparation precision of 100% and a testing precision of 93%.Therefore, random forest (RF) emerges as the fastest and most efficient model among the others, followed by the support vector machine (SVM) model, which achieved a high accuracy of 92%, albeit with a longer training time.The quickest times were recorded by K-nearest neighbors (KNNs) and decision tree (DT), but with a lower accuracy level of 91% compared to RF and SVM.Subsequently, the RF model would be integrated with other algorithms.
Table 4 compares various machine learning algorithms for live virtual machine (VM) migration based on their accuracy in managing different allocation scenarios.The four algorithms compared include K-nearest neighbors (KNNs), decision tree classification and KNN, support vector regression (SVR) models, and linear regression (LR) and neural networks (NNs).The first row of the table indicates that the KNN algorithm achieved a test accuracy of 95.0% for optimizing the pre-copy migration scenario.The second row shows that the decision tree classification and KNN algorithm achieved a test accuracy of 90.90% for forecasting the workload of the VM migration scenario.

Algorithm of Machine Learning Allocation Scenario Test Result Test Accuracy (%)
[17] K-nearest neighbors (KNNs) The optimization of pre-copy migration 95.00 [36] Decision tree classification and K-nearest neighbors (KNNs) The forecast of the workload of VM migration 90.90 [58] SVR (support vector regression) model The optimization of VM live migration 94.61 [59] Linear regression (LR) and neural network (NN) The optimal Selection of VM performance 97.80 The third row demonstrates that the SVR model achieved a test accuracy of 94.61% for optimizing the VM live migration scenario.Lastly, the fourth row reveals that the LR and NN algorithms achieved the highest test accuracy of 97.80% for selecting optimal VM performance scenarios.Overall, the results in Table 4 suggest that the LR and NN algorithms may be the most effective machine learning approaches for live VM migration in terms of accuracy, especially for scenarios involving the optimal selection of VM performance.However, it is important to note that the performance of these algorithms may vary depending on the specific characteristics of the data and the underlying infrastructure, so further testing and validation may be necessary before deciding on which algorithm to use in a given context.

Results
The live migration of VMs is supported in various configuration scenarios depending on which assets need to be relocated.The arrangement of relocation methods also includes distinct components that determine the type of movement.In this section, we will delve into the setup and types of live relocation, as illustrated in Figure 4 below.Figure 4 illustrates the configuration of a live migration model for a virtual machine (VM) in a cloud computing environment.The model consists of three main components: the source host, the target host, and the VM being migrated.The source host is the physical server where the VM is currently running.It contains the VM's memory, CPU, and storage resources.In the live migration process, the source host continues to run the VM until the migration is complete.The target host is the physical server where the VM will be migrated.It has the same hardware specifications as the source host.The target host is responsible for receiving the VM's memory, CPU, and storage resources from the source host and starting the VM on its hardware.The VM being migrated is the virtual machine that is running on the source host and will be moved to the target host.The VM's memory, CPU, and storage resources are copied from the source host to the target host during the migration process.The live migration process begins with the creation of a snapshot of the VM's memory, CPU, and storage resources on the source host.This snapshot is then transferred to the target host over the network.As the snapshot is being transferred, the VM continues to run on the source host.Once the snapshot is fully transferred to the target host, the VM's CPU and memory resources are stopped on the source host and restarted on the target host.The VM's storage resources are then synchronized between the source and target hosts to ensure data consistency.Finally, the VM is restarted on the target host, and the migration process is complete.Overall, the live migration model configuration shown in Figure 4 enables the seamless migration of a VM from one physical server to another without any downtime or disruption to the VM's applications or services.
The proposed algorithm combines three algorithms: hyperparameters are combined from MDP, GA, and RF.Table 5 displays the hybrid combination algorithm as shown and in Table 6 are the results of the testing dataset.Combining the MDP, GA, and RF algorithms offers several advantages.This algorithm can leverage the strengths and weaknesses of the other algorithms.MDP can model the probability between the features and target variables, GA can provide the best features, and RF can make decisions on selecting the best features.By implementing this hybrid ML approach, we can identify which VM has the most workload on the HM and determine the optimal time for the LVM process to maintain the DC performance consistently.Figure 5 provides an example to illustrate the definition of the VM coverage on the HM. Figure 5 illustrates a test scenario for VM placement and migration using the GWA-T-12 Bitbrains dataset [60,61].In this scenario, a migration process is conducted for the HMs experiencing overload utilizing datasets from Bitbrains clouds.The availability of resources at the destination HM, such as CPU, memory, and disk space, is considered when determining the VM placement.Moreover, equipment compatibility between the source and destination is assessed to ensure proper VM operation post-migration.Figure 6 displays the results of processing the dataset for this test scenario.Due to the lengthy loading process on GA, only the dataset resources of 156 MB are utilized.The live migration network is not executed due to the necessity for network simulation via a router and the internet.Figure 5 illustrates a test scenario for VM placement and migration using the GWA-T-12 Bitbrains dataset [60,61].In this scenario, a migration process is conducted for the HMs experiencing overload utilizing datasets from Bitbrains clouds.The availability of resources at the destination HM, such as CPU, memory, and disk space, is considered when determining the VM placement.Moreover, equipment compatibility between the source and destination is assessed to ensure proper VM operation post-migration.Figure 6 displays the results of processing the dataset for this test scenario.Due to the lengthy loading process on GA, only the dataset resources of 156 MB are utilized.The live migration network is not executed due to the necessity for network simulation via a router and the internet.

The Result of the Virtual Machine on the Dataset
In Figure 6, we attempted to test the dataset that had been created and then evaluated it with a hybrid ML algorithm.The results indicate that hybrid machine learning can effectively detect workloads from virtual machines, particularly when hosts are under heavy workloads.The test dataset comprised samples from 30 virtual machines    [60,61].In this scenario, a migration process is conducted for the HMs experiencing overload utilizing datasets from Bitbrains clouds.The availability of resources at the destination HM, such as CPU, memory, and disk space, is considered when determining the VM placement.Moreover, equipment compatibility between the source and destination is assessed to ensure proper VM operation post-migration.Figure 6 displays the results of processing the dataset for this test scenario.Due to the lengthy loading process on GA, only the dataset resources of 156 MB are utilized.The live migration network is not executed due to the necessity for network simulation via a router and the internet.

The Result of the Virtual Machine on the Dataset
In Figure 6, we attempted to test the dataset that had been created and then evaluated it with a hybrid ML algorithm.The results indicate that hybrid machine learning can effectively detect workloads from virtual machines, particularly when hosts are under heavy workloads.The test dataset comprised samples from 30 virtual machines Figure 6.Shows the result of the virtual machine on the dataset.

The Result of the Virtual Machine on the Dataset
In Figure 6, we attempted to test the dataset that had been created and then evaluated it with a hybrid ML algorithm.The results indicate that hybrid machine learning can effectively detect workloads from virtual machines, particularly when hosts are under heavy workloads.The test dataset comprised samples from 30 virtual machines distributed across HM 1, HM 2, and HM 3, totaling 284,234 records.In Tables 6 and 7, we assessed the algorithm.In Testing 1, we forecasted the workload on the HM that would undergo migration using the MDP approach, while in Table 6, we determined the workload of the HM on the VM that would undergo LVM.After predicting the workload on the HM, we conducted tests using the GA and RF algorithms with varying workloads of the VM and network traffic to optimize the LVM process.The results are evident in the data from Test 2, showing changes in HM 1, HM 2, and HM 3, indicating a decrease in the workload on the HM.The outcomes of Test 1 and Test 2 are influenced by the VM workload conditions.
The results of the tests in Test 1 and Test 2 can be seen in Figure 7.The developed algorithm runs optimally.The results of the tests in Test 1 and Test 2 can be seen in Figure 7.The developed algorithm runs optimally.

Discussion
The discussion section of the paper provides a detailed analysis of the results obtained from testing the development in the LVM process placement.The authors performed several tests and algorithm comparisons before combining the developed algorithms.They found that the KNN algorithm had an accuracy of 92% and 91% with longer training times, while the DT algorithm and logistic regression had faster training times but lower accuracy values of 91% and 89%.On the other hand, the RF algorithm had a high accuracy of around 93% and a faster training time compared to the SVM algorithm.Therefore, the authors decided to combine the RF algorithm with MDP and GA, resulting in a precision value of 99% and an F1 score of 98%, indicating that the hybrid machine learning developed was more optimal than other algorithms in the case of LVM.Table 8 summarizes the results of the algorithm ML live migration VM, showing that the proposed MDP + GA + RF algorithm achieved the highest accuracy of 99.00% in the live migration of the VM placement scenario, outperforming the other algorithms such as KNN, DT, logistic regression, and SVM.Overall, the results of the tests conducted in the study

Discussion
The discussion section of the paper provides a detailed analysis of the results obtained from testing the development in the LVM process placement.The authors performed several tests and algorithm comparisons before combining the developed algorithms.They found that the KNN algorithm had an accuracy of 92% and 91% with longer training times, while the DT algorithm and logistic regression had faster training times but lower accuracy values of 91% and 89%.On the other hand, the RF algorithm had a high accuracy of around 93% and a faster training time compared to the SVM algorithm.Therefore, the authors decided to combine the RF algorithm with MDP and GA, resulting in a precision value of 99% and an F1 score of 98%, indicating that the hybrid machine learning developed was more optimal than other algorithms in the case of LVM.Table 8 summarizes the results of the algorithm ML live migration VM, showing that the proposed MDP + GA + RF algorithm achieved the highest accuracy of 99.00% in the live migration of the VM placement scenario, outperforming the other algorithms such as KNN, DT, logistic regression, and SVM.Overall, the results of the tests conducted in the study prove that the proposed hybrid machine learning algorithm is superior to past research, achieving high accuracy and faster training times in the LVM process placement scenario, as can be seen in Table 8.

Conclusions
The paper presents a hybrid machine learning model for estimating the pre-copy live migration of virtual machines in a data center.The proposed model combines the Markov Decision Process (MDP), genetic algorithm (GA), and random forest (RF) algorithms to predict the order of virtual machine migration and identify the optimal host machine target.The hybrid model achieved 99% accuracy with faster training times compared to the previous studies using K-nearest neighbor, decision tree classification, support vector machines, logistic regression, and neural networks.The authors suggest further research

Figure 3
Figure 3 is a flowchart that outlines the research process for the study on live virtual machine (LVM) migration using a hybrid machine learning approach.The process is divided into five main stages: introduction, literature review, data collection, model development, and conclusion and future work.The introduction stage defines the problem and motivation for the study, provides background on LVM and machine learning, and states the research objectives and questions.The literature review stage summarizes previous LVM and machine learning studies, identifies gaps and limitations in current research, and formulates hypotheses for the study.The data collection stage describes the data sources and collection methods, explains data preprocessing and cleaning, and presents the final dataset used in the study.The model development stage introduces the evaluation metrics used to assess the model's performance, compares the results of the hybrid machine learning approach of the previous studies, and discusses the model's limitations and potential improvements.Finally, the conclusion and future work stage summarizes the study's main findings and contributions, discusses the study's implications for data center performance and management, and suggests directions and machine Figure a clear and concise visual representation of the research process for the study, highlighting the key stages and their corresponding activities.

Figure 3
Figure3is a flowchart that outlines the research process for the study on live virtual machine (LVM) migration using a hybrid machine learning approach.The process is divided into five main stages: introduction, literature review, data collection, model development, and conclusion and future work.The introduction stage defines the problem and motivation for the study, provides background on LVM and machine learning, and states the research objectives and questions.The literature review stage summarizes previous LVM and machine learning studies, identifies gaps and limitations in current research, and formulates hypotheses for the study.The data collection stage describes the data sources and collection methods, explains data preprocessing and cleaning, and presents the final dataset used in the study.The model development stage introduces the evaluation metrics used to assess the model's performance, compares the results of the hybrid machine learning approach of the previous studies, and discusses the model's limitations and potential improvements.Finally, the conclusion and future work stage summarizes the study's main findings and contributions, discusses the study's implications for data center performance and management, and suggests directions for future research on LVM and machine learning in data centers.Overall, Figure3provides a clear and concise visual representation of the research process for the study, highlighting the key stages and their corresponding activities.This research aimed to analyze the development of a hybrid machine learning model for VM migration utilizing MDP and GA to enhance the accuracy of the previous machine learning model and improve optimization in the efficient distribution of workloads in data centers (DCs).This allows it to operate optimally in its functions.The effectiveness of our approach was evaluated using two key performance indicators to explain the method used and further enhance the LVM method.Deciding whether to relocate the VM to the target HM is a crucial step in supporting LVM.Figure3illustrates the stages of the development of the hybrid machine learning model.We gathered the DC workload data related to CPU, memory, and network and assessed the workload on each VM based on the CPU and memory.These data were then processed using the MDP method, and the expected outcomes were used to evaluate the VM workload on each HM, serving as a reference

Figure 5 .
Figure 5. Scenario Test 1: VM placement and migration (HM1, HM2, and HM3).The blue color is the condition of the VM in HM and the green color is the migrated VM.

Figure 6 .
Figure 6.Shows the result of the virtual machine on the dataset.

Figure 5 .
Figure 5. Scenario Test 1: VM placement and migration (HM1, HM2, and HM3).The blue color is the condition of the VM in HM and the green color is the migrated VM.

Figure 5 .
Figure 5. Scenario Test 1: VM placement and migration (HM1, HM2, and HM3).The blue color is the condition of the VM in HM and the green color is the migrated VM.

Figure 5
Figure5illustrates a test scenario for VM placement and migration using the GWA-T-12 Bitbrains dataset[60,61].In this scenario, a migration process is conducted for the HMs experiencing overload utilizing datasets from Bitbrains clouds.The availability of resources at the destination HM, such as CPU, memory, and disk space, is considered when determining the VM placement.Moreover, equipment compatibility between the source and destination is assessed to ensure proper VM operation post-migration.Figure6displays the results of processing the dataset for this test scenario.Due to the lengthy loading process on GA, only the dataset resources of 156 MB are utilized.The live migration network is not executed due to the necessity for network simulation via a router and the internet.

Figure 6 .
Figure 6.Shows the result of the virtual machine on the dataset.

Figure 7 .
Figure 7. Shows the results of Test 1 and Test 2.

Figure 7 .
Figure 7. Shows the results of Test 1 and Test 2.

Table 1 .
Deploying virtual machines in live migration.

Table 4 .
Comparison of machine learning algorithms in live VM migration.

Table 5 .
Performance of the proposed hybrid model (%).

Table 6 .
Results of Dataset Testing 1.

Table 6 .
Results of Dataset Testing 1.

Table 7 .
Results of Dataset Testing 2.

Table 7 .
Results of Dataset Testing 2.

Table 8 .
Results of ML algorithm for live migration of VMs.