A Predictive Multi-Tenant Database Migration and Replication in the Cloud Environment

With the rapid adoption of multi-tenant databases, the cloud provider consolidates multiple tenants' database on server machines, where the tenants share a common application and database instances. To ensure the quality of service (QoS) for the leased resources, both sides (i.e., the user and the provider) create a Service Level Agreement (SLA). Higher SLA violations result in high SLA contractual penalties and increase the possibility of losing the tenant. In addition, the unusual workload patterns of each tenant transactions require seamless adjustments due to the sudden burden changes and variability. As a result, to satisfy simultaneously availability and performance tenant requirements, it is necessary to perform reliable tenant migration and replication to distribute the workload to a flexible set of sites and avoid SLA violations. In this research, a cluster-based multi-tenant database management system (CB-MT DBMS) is proposed, which takes the migration and replication decisions in advance by monitoring and acting before the violation of the SLA occurs. In addition, a dynamic proactive multi-tenant database migration and replication MTDB-MR algorithm is proposed to reduce collisions and inconsistencies between migration and replication decisions for a group of violated tenants. Experimental results show that the proposed MTDB-MR algorithm is the ideal candidate for migration and replication of the violated multi-tenant databases, as it minimizes the total number of SLA violations, the number of multi-tenant clients SLA violations, client sites average response time and total execution time of each multi-tenant client site as compared to the previous algorithms


I. INTRODUCTION
In a multi-tenant SaaS architecture, the tenants subscribe to a shared database to store their data. Thus, different performance can be achieved depending on the design of the data layer. Previously, various designs have been proposed for the multi-tenant data [5] [17]. The main difference between these designs is the level of separation of tenant data. Regardless of the layout of the data layer used in a multi-tenant structure, service providers face a difficult daily scenario. The tenants require strict guarantees for the performance and availability of the rental services, known as performance service level agreements (SLAs) [7][11] [18]. Performance Service Level Agreement (SLA) is an agreement between tenants and service providers, which sets the minimum performance and availability requirements for a rental service. It also defines penalties if the SLA is violated. On the other hand, for service providers to make a profit, they must reduce operating costs and utilize most of their hardware and software resources [33] [42]. However, in multi-tenant environments [19], each tenant needs only a small portion of a single node's resources. So, the degree of multi-tenant synchronization for each node is very high, which makes ensuring SLA agreements difficult and a crucial issue. Consequently, the cloud service provider should have an intelligent multi-tenant data storage system, which has efficient mechanisms for allocation, migration, and replication of the multi-tenant data. The main requirement for the multitenant data storage system is to provide a reliable Quality of Service (QoS) for the multi-tenants by satisfying SLA agreements. Moreover, it should reduce the cloud service provider operational costs, maximize the utilization of their hardware and software resources. However, designing such multi-tenant cloud intelligent data storage system has several critical challenges. The first challenge is satisfying Quality of Service (QoS) for the multi tenants through meeting the SLAs. In other words, meeting The proposed MTDB-MR algorithm consist of four services.
• The first service is the Global Monitoring service which monitors all the tenant databases by collecting the response time and the type of each tenant's transaction. • The second service is the Forecasting Service which utilizes a sophisticated system models for predicting the tenant's transaction response time. The forecasting service is built using three different prediction models: The Recursive Window Forecasting Autoregressive Integrated Moving Average (ARIMA) model, the Exponential Moving Average (EMA) model and the proposed Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells model to predict future window of 15 minutes of data. • The third service is the Access Log Analysis service which takes the migration and replication decision in advance based on both the workload of the global monitoring service and the results of the forecasting service. • The final service is the Tenant Weight Matrix service, which selects the optimal site to migrate or replicate the violated tenant database according to a set of proposed Migration and Replication Rules (MRR). The proposed MRR is designed to reduce collisions and inconsistencies between migration and replication decisions for a group of violated tenants. In other words, it selects the optimal sites for a group of violated tenants instead of choosing the optimal site for only one violated tenant as the previous work.
The rest of this paper is organized as follows; Section II overviews the related work. The proposed clustered based multi-tenant database management system (CB-MT DBMS) and its components are presented in Section III. Section IV presents the proposed multi-tenant database migration and replication (MTDB-MR) algorithm. The experimental evaluations are demonstrated in section V. Section VI discusses the performance evaluation and finally, conclusions and future research directions are given in Section VII.

II. Related work
The current literature can be categorized into two subsections. The first subsection discusses the multi-tenant data layer design and the advantages and disadvantages of each design. The second subsection discusses the current multi-tenant replication and migration techniques and their limitations.

A. Multi-Tenant Data Layer Design
In the multi-tenant SaaS architecture, three data layer designs were proposed and used in different application areas [5] [17].
The main difference between these designs is the level of separation of the tenants' data. The first design is called Independent Databases and Independent Database Instances (IDII) where each tenant has their own database system running on the server. However, this design violates the main idea of the multi-tenant SaaS architecture, where each tenant should share the same instance of software and hardware. The second design is called Independent Tables and Shared Database Instances (ITSI), where each tenant has their own separate tables running on a shared database instance. This design uses a single shared database system, which decreases the maintenance costs compared to the IDII as it uses a single shared database system. The final design is called Shared Tables and Shared Database Instances (STSI) where the tenants share the same database instance and tables. However, this design is more complex when it comes to customization. In addition, the security aspect of the tenants which share the same tables. If there are errors in the tenant application code, a tenant may access all other tenants' data.

B. Multi-Tenant Replication and Migration Techniques
Lately, data replication and migration techniques have gained much attention [14]. Whenever any change in tenant performance is discovered [3], multi-tenant migration and replication techniques are used to transfer the specified tenant under observation to another environment. These techniques are used to decrease the load imposed on the cloud host environment and to consolidate multiple tenants into the same host environment. Thus, exploiting multi-tenant migration and replication techniques aims to share the host resources with a reduced interference between tenants.
have claimed that the migrations and replication strategies have various important issues to consider: The first issue is that the tenants have irregular workloads pattern, which can affect the quality of services. The second issue is when to take the decision to migrate or replicate. The third issue is selecting which tenant to be the target of migration or replication. The fourth issue is choosing which operation (migration or replication) to be applied to the target tenant. The final issue is selecting the site where the tenant to be replicated or migrated. However, according to [26], controlling the number of replicas is also important. To handle the issue of tenant's irregular workloads patterns, dynamic provisioning techniques [22] are designed which take actions based on the observations of the workloads. There are two types of provisioning techniques: reactive and proactive [40]. Proactive techniques use prediction models to predict the future tenant access patterns then use the prediction results to take the migration or replication decisions to mitigate the crisis before happening. In contrast to this, reactive techniques detect and react to existing SLA violations by using a predefined threshold.

1) Proactive Techniques
The authors of [16] [22] proposed the PredRep approach, which analyzed the cloud database system workload. They stated that tenants have irregular workload patterns which affect quality of service guarantees, mainly due to the overlap between tenants. They used the SLA (response time) as the target objective for each tenant. However, the authors of [16] [22] stated that, despite the satisfactory results, the proposed PredRep approach has several limitations: firstly, violated tenants with large size databases and irregular workloads patterns will not be suitable for replication due to the data backup overhead and the restore time. Secondly, the constant change in workload patterns can lead to inaccurate prediction results. Finally, large changes in workload patterns can cause interference to other tenants.
If the SLA of a tenant cannot be satisfied, it must migrate to a site where its quality can be guaranteed. As a result, they proposed an allocation strategy to reduce the provider penalties cost of SLA violations and improve performance. The aim of the allocation strategy is to decide whether the target tenant can be migrated to an existing VM or a new VM.
Given a tenant L in a site K to be migrated. For all tenants I in the selected site (J) to migrate the violated tenant L located in site (K), the authors defined MTIJ as the mean execution time of the last M executed transactions. Where SLAIJ is the performance service level agreement for tenant (I) in site (J) and the (NJ) is the number of tenants in site J. The allocation strategy sends the tenant to the site X that has a max MDJ' as shown in (1), if and only if the site X has a free disk space and x ≠ k. However, if one or more tenants have violations in a site, then the MD' will not be calculated correctly.
The work of [13], which is the extension of [30], showed that the AutoRegressive Integrated Moving Average (ARIMA) and the Exponential Moving Average (EMA) prediction models obtained similar accuracy for time series prediction. The authors of [25] introduced a data replication strategy that does not predict performance, but once the individual query arrives at the nodes, it estimates the response time of individual queries. Then it evaluates whether the response time can be satisfied or not. If response time cannot be satisfied, it creates a replica only if the creation is profitable. However, this approach can generate many replicas which result in a storage overhead.
RepliC is a cloud-based database replication strategy that supports quality of service, flexibility, and multi-tenancy [21]. The elasticity changes the system's capacity dependent on the current workload by adding and removing replicas. Extra resources are added if the monitored value does not match the set SLA. The author, however, did not specify the location of the newly added replica.
From a risk management perspective, the author of [36] approaches suggested the method of SLA violation detection and abatement. Following the establishment of an SLA, they offered a Risk Management-based Framework for SLA Violation Abatement (RMF-SLA), which includes SLA monitoring, violation prediction, and decision recommendation.
The authors of [37] offered FLAS (Forecasted Load Auto-Scaling), an auto-scaler for distributed services that combines the benefits of proactive and reactive techniques based on the scenario to determine the optimum scaling actions at any given time. The main novelties introduced by FLAS are firstly a predictive model of the high-level metrics trend that allows for the prediction of changes in the relevant SLA parameters (e.g. performance metrics such as response time or throughput) and secondly a reactive contingency system based on the estimation of high-level metrics from resource use metrics, reducing the required instrumentation.
The authors of [38] presented heuristic policies that use the recursive least squares method to forecast QoS parameters and compute the resources required in future intervals; however, the procedure of addressing SLA breaches when they are predicted by the system is not defined.
According to the evaluation results, the authors of [39] got an ideal prediction result by utilizing small intervals for prediction and the autoregressive integrated moving average (ARIMA) method. VOLUME XX, 2017

2) Reactive Techniques
Some other authors have adopted reactive technology to detect and react after a Service Level Agreement (SLA) violation has occurred. The authors of [15] suggested an approach called SWAT which exchanged primary replica with one of its secondary replicas to balance the load. The proposed approach is very effective in terms of time and resources as it does not incur primary replica movement. However, the approach has two limitations; the first limitation is that the server can be overloaded until the approach takes a decision. The second limitation is the authors assumed that the secondary replica executes only the update queries relayed from the primary replica and the read only queries has a zero load on secondary replica which may not be always valid in the real application.
The authors of [23] proposed two reactive solutions to balance the load in case any machine of the multi-tenant database system become overloaded. The first solution swaps the primary replica on the overloaded machines with one of its secondary replicas in any other machine. The second solution migrates the replicas one at a time from the most overloaded machine to the most underloaded machine until either the load of the machine gets balanced, or the algorithm migrates the maximum number of the database replicas. The algorithm moves the most active database first to mitigate the crisis with the minimum number of migrations. However, the two solutions have various limitations.
The first limitation is that the authors did not clarify the criteria based on which they choose the secondary replica to be swapped to mitigate the crisis. The second limitation is that the algorithm does not search for the source of the crisis as it always searches for any primary replica, or it selects the most active replica in the overloaded machine to swap or migrate it. The third limitation is that the algorithm does not consider the impact of the swapping/migration solution on the query access time. The fourth limitation is the reactive manner of the algorithm as it does not consider the distribution of the multitenant databases over the cloud machines in each region. More specifically, the algorithm runs every week in the dataset by moving the most active replica in the overloaded machine and then tests the effect of the migration on the next week.
Consequently, the machine can stay overloaded for one or more weeks until the algorithm selects the right replica to swap or migrate it, which can increase the number of SLA violations. Moreover, the authors of [22] stated that there is a limitation in the algorithm proposed in [23] for handling the workload changes. The limitation arises when multiple tenants are very active at the same time, then SLA violation may occur as only one database will be migrated at time.
The author of [12] presented a lightweight load balancing mechanism to solve the hotspot problems by exchanging the roles between tenants' primary replicas and secondary replicas. They mentioned that the existing load balancing work of [15] using replica exchange cannot be practically used. As their solution [15] assumed that each tenant has only one primary replica and one secondary replica used for fault tolerance. They assumed that the read-only queries that incur on the tenant secondary replicas has zero load and these queries just execute the update logs relayed from the tenant primary replicas. However, the authors of [12] mentioned that the load of the read only queries on the tenant secondary replicas should not be neglected in the practical use. Consequently, they [12] proposed a load balancing replica exchange strategy to select a suitable tenant secondary replica among multiple tenants' secondary replicas and exchange roles with the selected tenant primary replica as a result, it solves the first limitation of the previous work [23].
However, most of the aforementioned limitations of [23] still exist in the algorithm [12] as it does not search for the source of the crisis since it always searches for a primary replica to swap it with one of its secondary replicas. Also, the authors did not mention what happens if the algorithm cannot find a suitable secondary replica. Additionally, the algorithm does not consider the distribution of the multi-tenant databases over the cloud machines in each region while applying the migration processes. Finally, the algorithm does not consider the case of having more than one overloaded host and how to take the optimal solution for each host while reducing the interference between the migration and replication decisions in each host.
Based on the preceding discussion, it is obvious that although various techniques for cloud multi-tenant replication and migration approaches have been proposed in the literature, not all of them assist the service provider on the allocation, replication, and migration decisions necessary to mitigate the SLA violation. Table. I. compare different multi-tenant approaches on the five criteria required for mitigating SLA violations; namely the ability to predict possible SLA violations in proactive approaches (i.e., Proactive), the ability to detect and react after a SLA violation has occurred in reactive approaches (i.e., Reactive), the ability to migrate the violated tenant to mitigate the crisis (i.e., Migration), and the ability to replicate the violated tenant to mitigate the crisis (i.e., Replication), and finally the ability to best allocate all the violated tenant in case there are more than one violated tenant (i.e., Placement).
If the monitored value is not in accordance with the defined SLA, more resources are added.

•
The author did not determine the location of the new added replica.

•
The authors created a new VM with the slave replica.
The authors stated that the ARIMA prediction method generates some errors. They suggested studying the behavior of the signals using other prediction methods. [22] Y N N Y N • Tenants with large databases and irregular workload patterns would not be suitable for replication due to data backup overhead and the time required to restore. • Constant change in workload patterns can lead to inaccurate forecasting results as well as cause interference with other tenants. [25] Y N N Y N This approach may generate many replicas.
No replication mechanisms are used for the addition of replicas. [13] Y The procedure of addressing SLA violations when they are predicted by the system is not defined.
The server can get overloaded before the approach makes a choice.

•
The authors believed that the secondary replica only performs update queries relayed from the primary replica, and that read-only queries have no load on the secondary replica, which may not necessarily be true in practice.

•
The existing load balancing work based on replica exchange could not be implemented in practice [12].
The authors did not specify the criterion by which they chose the secondary replica to be exchanged to reduce the crisis.

•
The algorithm always looks for any main replica or chooses the most active replica in an overloaded machine to swap or move, instead of looking for the source of the problem.

•
The algorithm ignores the influence of the swapping/migration solution on the query access time.

•
The algorithm does not account the distribution of multi-tenant databases across cloud machines in each region. As a result, the system may remain overloaded for one or more weeks until the algorithm finds the appropriate replica to swap or migrate, thus result in increasing the number of SLA violations.

•
When numerous tenants are very active at the same time, SLA violations can occur since only one database is moved at a time [22].
The algorithm retains most of the limitations of [23], as it does not look for the source of the crisis because it constantly looks for a primary copy to swap with one of its secondary copies.

•
The authors did not indicate what occurs if the algorithm is unable to locate a suitable secondary replica.

•
When implementing migration operations, the algorithm did not consider the distribution of multitenant databases among cloud sites.

•
The approach ignores the scenario of multiple overloaded hosts and how to find the best answer for each one while minimizing interference between the migration and replication decisions in each host. VOLUME XX, 2017

IV. The Proposed Multi-Tenant Database Migration and Replication (MTDB-MR) Algorithm
The primary goal, as stated in the literature review, is to develop a multi-tenant migration and replication algorithm capable of selecting the best solution for several violation tenants based on their irregular workload patterns, instead of generating it for only one violated tenant, as in earlier studies Finally, it determines the optimal site to migrate or replicate the violating tenants using the tenant weight matrices service and clustering service, which reduce collisions and inconsistencies between migration and replication decisions for a group of violated tenants.

A. Global Monitoring Service
The purpose of the global monitoring service is mainly used to store the complete information for each tenant transaction. The collected information consists of the transaction response time and the transaction type. This collected information in addition to the location of each tenant and its replica are stored in a database called global catalog log database.

B. Query Coordinator Service
The query coordinator service is used to redirect the tenant transaction to the closest suitable host to execute the transaction. As a result, the target tenant or its replicas must be placed on a host that is closest to the sites imposing the most amount of the queries. After executing the transaction, a complete transaction information will be stored on the global catalog log database which consists of the tenant's transaction response time, the transaction type, client location and tenant location, which will be used later to take either replication or migration decisions.

C. Forecasting Service
As any tenant demands the stringent guarantees for the performance and availability of the rented services, which are known as performance service level agreements (SLAs), the forecasting service uses the information in the catalog log database to predict the behavior of the tenant transaction in advance.
However, in case the forecasting service detects an SLA violation, the proposed MTDB-MR algorithm should have sufficient time to take an action before the violation occurs and thus avoid SLA violation and its contractual penalties. We developed the proposed forecasting service previously [34] using three different prediction models: The aim of building the proposed forecasting service using three different prediction models is to compare the accuracy of different prediction model using the same monitored data and in the same multi-tenant environment to select the optimal prediction model.

D. Clustering Service
Clustering service is usually accomplished by determining the similarity between the items depending on their characteristics [31] [32]. Based on our literature review, there are two efficient ways to cluster the distributed database sites. The simplest way is clustering the distributed database sites using the region fields of the database [27], where the sites in the same region are assigned to the same cluster. However, this way of clustering will not work properly in case of all the sites belonging to the same region or each site belonging to a different region. The second way to cluster distributed database sites is to utilize the communication cost between database sites where sites are grouped into disjoint clusters according to the least average communication cost between network sites [1][2] [8].
This clustering process depends highly on the communication cost range (CCR) value. If the communication cost between two sites is less than the CCR then the two sites are grouped into one cluster, which depends on how much time is allowed for the sites of the same cluster to transmit or receive their data.
The main advantages of this strategy are that it minimizes the time required for query processing and data allocation and it can be implemented in different environments even if the sites are enormous. However, it depends on the value of the CCR, which is considered as trivial drawback as it can be determined easily by the network admin. Consequently, we implement this strategy in multi-tenant environment to minimize the time required to execute the query transactions and multi-tenant database migration and replication.

E. Access Log Analysis Service
As the proposed MTDB-MR algorithm is used for online transaction processing (OLTP) purposes, there are two types of the tenants in the multi-tenant architecture: primary replica and secondary replica. Primary replicas allow read/insert/delete/update operations whereas secondary replicas allow only read operations. As a result, to ensure the SLA guarantees, the target tenant or its replicas must be placed on the closest host to the most amount of the queries.
Consequently, the proposed access log analysis service uses the forecasting service results to detect the violated tenants. After that, it uses the violated tenant monitored information which is collected by the monitored service and stored on the global catalog log database, to take one of two decisions (i.e., replicate the violated tenant or migrate the violated tenant).
That decision is taken based on the number and the types of violated tenants' transactions as shown in a flowchart in Fig.  1.

F. Tenants' Weights Matrices Service
To select the optimal sites for a group of violated tenants and not for just only one violated tenant as the previous works [ Table. II.

1) TENANT COMMUNICATION COST RULE (TCCR)
To reduce the time required to carry out tenant transactions, the tenant or its replica should be moved closer to the client sites from where the largest number of transactions are made. Thus, the main purpose of the proposed tenant communications cost rule is to predict the total cost of executing all tenant transactions when the violating tenant is migrated or replicated to a particular site. Thus, the ideal location for migrating the tenant or its copy is that results in the largest reduction in transactions execution time for all client sites and not just one site as it is currently in the previous works. To implement the TCCR rule, the proposed MTDB-MR algorithm utilizes both the tenant access history (that is, collected using the Global Monitoring Service and stored in the catalog log database) and the results of the access log analysis service. The proposed MTDB-MR algorithm predicts the total cost of access to execute queries of all client sites during the last window before the migration or replication decision is implemented.  The pseudocode for generating the TTVM matrix is shown in Fig. 3 whereas the pseudocode for generating the TSVM matrix is shown in Fig. 4.

3) TENANT MIX RULE
As the functionality of the tenant primary and secondary replicas are different in the multi-tenant database environments, the cloud service providers prefer to host a mix of tenant's primary and secondary replicas on each server to balance the load across all servers [4] .
Consequently, the proposed tenant mix rule is used to ensure that each host has a mix of tenants with primary and secondary replicas to balance the load across all servers using the proposed Tenant Mix Matrix (TMM). TMM is a table constructed by placing the target tenants as rows and the target sites as columns. The value of the TMM (TI, SJ) is determined by (6). The value of TMM (TI, SJ) is equal to 1 if the target tenant TI can be allocated to the site SJ. To balance the load across all servers, the value of the TMM can be set to 1 in two cases; the first case is when the violated tenant type is a secondary replica and the number of primary replicas in the selected site is greater than 0. The second case is when the violated tenant type is a primary replica and the number of secondary replicas in the selected site is greater than 0.
The complexity of the generating TMM is O (T*M) where T is the number of violated tenants and M is the number of multitenant system sites.

4) TENANT SWAP RULE
The authors of [12][23] stated that a performance crisis can be mitigated by exchanging the roles between tenants' primary replicas and secondary replicas. Whenever the performance crisis is detected on a specific site, their scheduling algorithm selects a subset of the tenants in the affected site to swap them with their secondary replicas.
However, this algorithm has many limitations. The first limitation is that a server that has a failure will stay overloaded until the algorithm can select the right tenant and perform the swap. The second limitation is the algorithm always searches for the primary replica to swap it with one of its secondary replicas. Moreover, the crisis will not be mitigated if the algorithm does not find a suitable secondary replica. Finally, it also violates the proposed tenant mix rule, which states that the host must have a mix of tenant's primary and secondary replicas with the goal of balancing the load across all servers.

V. Evaluation
To evaluate the proposed MTDB-MR algorithm in a multitenant environment, it is important to use an appropriate standard criterion. But according to [21], there is no standard benchmark for assessing multi-tenant environments. Therefore, to simulate a multi-tenant environment, we provide a full multi-tenant environment with different databases in our evaluation.
These databases were provided by the OLTPBenchmark framework [35]. This framework provides different benchmarks such as TPC-DS and TPC-H. The OLTPBenchmark has over a decade's worth of experience in providing industry standard workloads, which is designed to produce the variable mixture, the variable rate load against any relational database.
The TPC-H Benchmark is a decision support benchmark that consists of a set of business queries and synchronized data modifications.
The queries and data in the TPC-H Benchmark database have been chosen to have remarkable importance at the industry level. The TPC-DS is a standard benchmark for measuring the performance of decision support systems solutions including, but not limited to, Big Data systems.
The evaluation section is divided into six subsections. The creation of an assessment environment for the multi-tenant database workload utilizing two distinct TPC benchmarks, as detailed in the first subsection.
The final subsection compares the performance of the proposed MTDB-MR method to the performance of the following state-of-the-art migration models: swap algorithm [12], step algorithm [23], allocation strategy [16] [22], and basic strategy for mitigating the crisis of the two violated tenants (TPC-DS1, TPC-H2).

A. Building the Simulated Multi-Tenant Database Environment
The performance of the proposed MTDB-MR algorithm is studied in a simulated environment. The simulated environment consists of 8 Fujitsu esprimo-P556 sites clustered into 3 clusters as shown in Fig. 5. Each site has a Core I7-3.40 GHz processor and 8 GB DDR3 MEMORY using the SQL Server as DBMS.
The clustering technique presented in [8] is used to cluster the sites into three clusters as shown in Fig. 5, based on the communication cost assumptions between sites in milliseconds, as shown in Table. III. The communication costs between sites are randomly assumed so that the clustering algorithm can be applied to group the sites into three clusters based on the communication cost between the sites.
As illustrated in Fig. 5, each site has one DBMS with different tenant databases of different types (primary and secondary). In addition, each site runs a SQLQueryStress Performance Testing Tool [9], which simulates the multi-tenant client's connections.
We use 6 instances of TPC benchmarks were used: TPC-DS and TPC-H with its secondary replicas, which are distributed randomly to the sites of the multi-tenant system to study the ability of the proposed algorithm to take an optimal decision for a group of violated tenants and not for just only one tenant as the previous migration algorithms. The distribution of the tenants is shown in Fig. 5. As mentioned before and based on the merits and demerits of different multi-tenant data layer designs, the data layer is built using the Independent Tables and Shared Database Instances (ITSI) data layer design, where each tenant has their own separate tables running on a shared database instance which consequently decreases the maintenance cost when compared to other designs. SQLQuersyStress is a tool designed to test query loads. It allows to specify the number of virtual users (i.e., up to 200 virtual users) and the number of iterations, which specifies the number of times a test query must be executed simultaneously by each virtual user to simulate the workload. However, in our proposed MTDB-MR algorithm, the access history of each violated tenant should be studied to take an appropriate decision for the violated tenants. As a result, the SQLQueryStress tool is edited to record the query type and the response time for each executed iteration. To simulate the multi-tenant database workload in each tenant site, a procedure for each TPC benchmark is built which contains a list of benchmark samples' queries identified by QueryID [10]. However, if the TPC benchmark procedure was tested in a loop with the same QueryID, each run after the first would be faster, due to the data caching, which could affect the accuracy of the test. To fix this problem, SQLQueryStress needs to send a different QueryID every time to each virtual user to solve the data caching issue.
To verify the quality of the proposed MTDB-MR algorithm, a violation in two different tenants (TPC-DS1 and TPC-H2) located in different sites in different clusters were simulated.
To make a violation, the rate of queries and the number of virtual users that access the tenant simultaneously were increased. The distribution of the queries, number of virtual users, and the type of the queries submitted by each multitenant client site to each target tenant (TPC-DS1, TPC-H2) is shown in Table. IV.

B. Apply Swap Algorithm
In order to mitigate the crisis for the first violated tenant (TPC-DS1)(P), the Swap algorithm [12] solves the crisis by searching for a primary replica in the violated site and swapping it with one of its secondary replicas in different site. According to the tenant distribution shown in Fig. 5, the violated site 3 contains only one primary replica. As a result, the swap algorithm swaps it with its secondary replica located in site 4.
Regarding the second violated tenant (TPC-H2)(P), swap algorithm [12] searches for a primary replica in the violated site. However, the violated site has only one primary replica TPC-H2 which is the source of violation. As a result, the swap algorithm [12] is not able to solve this case as it does not consider the case when a tenant does not have any secondary replicas in any other sites.

C. Apply Step Algorithm
In order to mitigate the crisis for the first violated tenant (TPC-DS1)(P), Step algorithm [23] proposed two reactive solutions to mitigate the crisis. The first solution leads to the same results of the swap algorithm [12], which swap the primary replica on the overloaded machines with one of its secondary replicas in any other machine. The second solution moves the most active database from the overloaded site to the least active site to mitigate the crisis with minimum number of migrations.
According to the tenant distribution shown in Fig. 5, the least active site is site 4. However, site 4 contains a secondary replica of the target tenant TPC-DS1. As a result, the second solution of the step algorithm [23] migrates the tenant to site 2 which is the second least active site. Regarding the second violated tenant (TPC-H2)(P), the Step algorithm [23] sends the violated TPC-H2 tenant to site 4 which is the least active site.

D. Apply the allocation strategy
In order to mitigate the crisis for the first violated tenant (TPC-DS1)(P), allocation strategy of [16] [22] uses the SLA (response time) as the target objective for each tenant. Therefore, the tenant TPC-DS1 will be migrated to site 7 which has the max average free time. Regarding the second violated tenant (TPC-H2)(P), The allocation strategy of [16] [22] migrates the tenant TPC-H2 to site 1 which has the second maximum average free time. Meanwhile, the first violated tenant TPC-DS1 is sent to the first average free time site (i.e., site 7).

E. Apply the proposed MTDB-MR algorithm
The proposed MTDB-MR algorithm is designed to reduce collisions and inconsistencies between migration and replication decisions for a group of violated tenants. In other words, it makes optimal decisions for a group of violated tenants instead of choosing the optimal solution for only one violated tenant.

MTDB-MR
Step [23] Allocation [16][22] Swap [12] Basic Based on the tenant's access history which is stored in the global catalog log database, the access log analysis service analyzes the access history of each target tenants (TPC-DS1 and TPC-H2) transactions to take one of two decisions (i.e., replicate the target tenant or migrate the target tenant). Based on the results of the access log analysis service, tenant TPC-DS1 should be migrated to another site, while the tenant TPC-H2 should be replicated to another site to mitigate the crisis.
To select the optimal sites to replicate and migrate the violated tenants, the proposed MTDB-MR algorithm construct the TSWM from rule-based weights matrices: tenant mix matrix TMM, tenant swap matrix TSM, tenant execution cost matrix TECM and tenant site violation matrix TSVM.
The generated TSWM is shown in Table. V which is used as a basis for the assignment of the tenant to the sites. Now, the main objective is to optimally assign the tenants to the site while reducing the interference between the migration and replication decisions for each target tenant. Consequently, from the TSWM, the highest weight in each violated target tenant row, represents a convenient site to migrate or replicate the violated tenant to mitigate the crisis. As a result, TPC-DS1 will be migrated to site 6 and TPC-H2 will be replicated to site 3.

F. Evaluate the accuracy of the Proposed MTDB-MR and all previous algorithms
In this part, the proposed MTDB-MR for multi-tenant database migration and replication will be compared to prior works to demonstrate its superiority in dealing with multitenant migration and replication difficulties while reducing SLA violations. Clients may be lost if SLAs are violated at a higher rate. As a consequence, we evaluate the influence of the proposed MTDB-MR method, as well as all previous techniques, on the overall number of SLA violations.
As a result, in order to show the superiority of the proposed MTDB-MR algorithm, we compare its performance to that of prior algorithms (Swap algorithm [12], Step algorithm [23], Allocation strategy of [16] [22], and basic strategy) on the following performance factors: • The overall number of SLA violations.
• The number of SLA violations by multi-tenant clients.
• The average response time of each client site queries after applying the proposed MTDB-MR algorithm and the previous algorithms. • The total execution time of each multi-tenant client site The tenant evaluation findings for the first violated tenant, TPC-DS1, are shown in Fig. 6. To begin, Fig. 6 (A) shows that the proposed MTDB-MR reduces the number of SLA violations by 78.5 % to the basic strategy with no migration and replication, whereas the Swap algorithm [12], Step algorithm [23], and Allocation strategy of [16] [22] increase the number of SLA violations because they do not select the source of the problem and do not determine the right site to migrate the violated tenant replica. Second, as shown in Fig. 6 (B), the proposed MTDB-MR decreases the overall number of violations at each client site when compared to the prior Swap algorithm [12], Step algorithm [23], and Allocation strategy of [16] [22], and basic strategy.
Similarly, Fig. 6 (C) illustrates the average response time of each client site inquiry after employing the proposed MTDB-MR algorithm, as well as the prior Swap algorithm, Step algorithm, Allocation strategy, and basic strategy. It demonstrates that the proposed MTDB-MR decreases the average response time for multiple clients by an average 27.5 % to the basic strategy without migration and replication. Fig. 6 (B) and Fig. 6 (C) demonstrate that the proposed MTDB-MR chooses the optimal site to move the TPC-DS1 violating tenant, resulting in significantly fewer SLA violations and average response time for multiple multi-tenant client sites.
Finally, Fig. 6 (D) illustrates that when compared to all prior algorithms, including the basic strategy, the proposed MTDB-MR algorithm results in a significant decrease in overall execution time for each client site completed transaction. It demonstrates that the proposed MTDB-MR reduces overall execution time for all transactions by 43.69 % to the basic strategy with no migration or replication. The assessment findings presented in Fig. 6 (B), Fig. 6 (C), and Fig. 6 (D) demonstrate that the basic strategy and swap algorithm appear to be superior to our MTDB-MR method in just one location (site 4).
The following are the reasons behind this: In the basic approach, TPC-DS1 tenant and transactions are on the same site (site 4), therefore all TPC-DS1 tenant transactions are local to that site.
Furthermore, in the swap method, the tenant is switched with their secondary replica located at site 3 in the same cluster, and all TPC-DS1 tenant transactions in site 4 are deemed local to the new site 3, which is also located in the same cluster.
The tenant evaluation findings for the second violated tenant, TPC-H2, are shown in Fig. 7. To begin, Fig. 7 (A) demonstrates that the proposed MTDB-MR reduces the number of SLA violations by 99.5 % to the basic strategy.
The swap algorithm [12], on the other hand, is unable to handle this issue since it does not take into account the scenario where a tenant has no secondary replicas in any other sites. As a result, both the swap algorithm and the basic strategy have the same number of violations.
When compared to the basic strategy, the allocation and step algorithms either have a little increase or decrease in the number of SLA violations since they do not determine the source of the problem and do not choose the optimal site to migrate the tenant to.
To summarize, current experiment findings demonstrate that the proposed MTDB-MR algorithm considerably decreases the overall number of violations when compared to all prior methods, including the basic strategy. According to the studies, the proposed MTDB-MR determines the optimal location to replicate the violating tenant, resulting in fewer SLA violations.
Second, considering Fig. 7 (B) which demonstrates that the proposed MTDB-MR significantly reduces the average response time for multiple multi-tenant client sites. It demonstrates that the proposed MTDB-MR reduces average response time for many clients by 45.875 % compared to the basic strategy without migration and replication. Finally, Fig.  7 (C) illustrates that the proposed MTDB-MR reduces total execution time for all transactions by 45.8 % to the basic strategy with no migration and replication.

VI. Discussion on Performance Evaluation
Recently, the techniques of data allocation, replication and migration have received much attention [14]. Whenever a change in a tenant's performance is detected [3], allocation, replication and migration techniques are used to move the specific tenant under violations to another environment. These techniques are used to reduce the load on a cloud host environment and to integrate multiple tenants into the same host environment. However, designing such effective strategies for allocating, migrating, and replicating tenant databases has several critical issues. Firstly, different performance and availability requirements (SLAs) of all tenants should be considered. In other words, meeting a Service Level Agreement (SLA) is a vital key to achieving a high overall quality of service. High SLA violation rates indicate the potential for losing a tenant. Therefore, the number of SLA violations should be reduced to reduce costs from a cloud provider's perspective. As a result, the first task is how to ensure an SLA for a group of tenants according to the current workload while maximizing the utilization of hardware and software resources with small SLA violations. Secondly, tenants have irregular workload patterns, which require constant adjustments due to unexpected changes in workload and diversity. As a result, the second task is how to migrate and replicate the tenant databases on demand to distribute the irregular workload over a flexible set of devices. Thirdly, in case there are more than one violated tenant, an optimal allocation of the migrated and replicated tenant database could be very complex and requires good knowledge and experiences to reach. As a result, the third task is how to allocate the replicated and migrated tenant databases.
A new dynamic proactive multi-tenant database migration and replication (MTDB-MR) algorithm is proposed to handle the issues of irregular workloads and to meet the multi-tenant quality of service by avoiding service level agreement violations. To evaluate the performance of the proposed MTDB-MR algorithm in a multi-tenant environment, it is important to use an appropriate standard criterion. But according to [21], there is no standard benchmark for assessing multi-tenant environments. Consequently, multiple instances of TPC benchmarks: TPC-DS and TPC-H are used, to simulate a multi-tenant environment. But each tenant has irregular workload patterns which gracefully require adjustments due to the unexpected workload changes and variability.
Experiment results show that the proposed MTDB-MR algorithm is the ideal candidate for migration and replication of the violated multi-tenant databases, as it detects the source of the problem and selects the optimal operation to apply based on the results of the access log analysis services. It also selects the optimal site to migrate or replicate the violated tenant according to a set of rules which reduces the expected number of violations and reduces the expected execution time for all sites and not for just only one site as the previous works. Moreover, the proposed algorithm considers the tenant mix and swap rules to distribute the load of the queries and minimize the number of migrations.

VII. Conclusion and Future Work
A multi-tenant database has a predominant role in hosting multiple tenants within a single DBMS with the active sharing of resources enabled. Providing these performance goals is a challenge for cloud service providers as they must balance the performance they can provide to their tenants and the operating costs. Additionally, tenants may have erratic workload patterns that negatively impact the SLA guarantees. Therefore, a promising solution for service providers is to replicate and migrate the tenants databases, which is beneficial to service availability, performance, flexibility and quality. In this paper, a new clustered based multi-tenant database management system (CB-MT DBMS) is proposed. In addition, a dynamic proactive multi-tenant database migration and replication MTDB-MR algorithm is proposed which uses prediction results to enhance the anticipated need for migration and replication of multi-tenant data and to meet the multi-tenant quality of service by avoiding service level agreement violations. Experimental results show that the proposed MTDB-MR algorithm results in a significant reduction in the average response time by average 36.68% and total number of violations by average 89% for more than tenant client site and also results in a significant reduction in the total execution time by average 44.74% when compared to previous basic strategy. As a future work, the proposed MTDB-MR algorithm will be extended to use different prediction models in the forecasting service.