Scalability analysis comparisons of cloud-based software services

Performance and scalability testing and measurements of cloud-based software services are necessary for future optimizations and growth of cloud computing. Scalability, elasticity, and efficiency are interrelated aspects of cloud-based software services’ performance requirements. In this work, we use a technical measurement of the scalability of cloud-based software services. Our technical scalability metrics are inspired by metrics of elasticity. We used two cloud-based systems to demonstrate the usefulness of our metrics and compare their scalability performance in two cloud platforms: Amazon EC2 and Microsoft Azure. Our experimental analysis considers three sets of comparisons: first we compare the same cloud-based software service hosted on two different public cloud platforms; second we compare two different cloud-based software services hosted on the same cloud platform; finally, we compare between the same cloud-based software service hosted on the same cloud platform with two different auto-scaling policies. We note that our technical scalability metrics can be integrated into a previously proposed utility oriented metric of scalability. We discuss the implications of our work.


INTRODUCTION
Cloud-based applications are increasing rapidly as hosting cost have been reduced and 32 computing resources become more available and efficient. In order to maximize the 33 scalability and performance of any software system, it is essential to incorporate performance 34 and scalability testing and assessment into the development lifecycle. This will provide an 35 important foundation for future optimization and will support the Service Level Agreement 36 (SLA) compliant quality of cloud services [1,2]. There are three typical requirements that are 37 associated with the performance of cloud-based applications: scalability, elasticity, and 38 efficiency [3,4]. 39 In this study, we adopt technical definitions of these performance features, which were 40 identified by Lehrig et al. [5]. Scalability is the ability of the cloud layer to increase the 41 capacity of the software service delivery by expanding the quantity of the software service 42 that is provided. Elasticity is the level of autonomous adaptation provided by the cloud layer 43 in response to variable demand for the software service. Efficiency is the measure of 44 matching the quantity of software service available for delivery with the quantity of demand 45 for the software service. However, we note that alternative, utility-oriented (i.e. economic 46 cost/benefit focused) approaches are also used in the literature for the conceptualization and 47 measurement of these performance aspects of cloud-based services [6,7]. Technical 48 scalability measurements and testing is key to assessing and measuring the performance of 49 cloud-based software services [1,8]. Both elasticity and efficiency aspects depend on 50 scalability performance. 51 Cloud Computing, auto-scaling and load-balancing features provide the support for 52 cloud-based applications to be more scalable, which allows such applications to be able to 53 deal with sudden workload by adding more of instance(s) at runtime. Furthermore, as cloud- 54 based applications are being offered as Software as a Services (SaaS), and the use of multi-55 tenancy architectures [9]; emphasizes the need for scalability that supports the availability 56 and productivity of the services and on-demand resources. 57 A relevant systematic literature review reports, only a few research works (e.g. project 58 reports, MSc theses) which try to address the assessment of technical scalability of cloud-59 based software services [5]. However, recently a number of publications addressed the 60 technical measurement of the elasticity of cloud-based provision of software services [5,10]. 61 On the other hand, other recent publications address the scalability of cloud-based software 62 services from utility perspective [5][6][7]11]. 63 In order to try to improve the scalability of any software system, we need to understand 64 the system's components that effect and contribute to scalability performance of the service. 65 This could help to design suitable test scenarios, and provides a basis for future opportunities 66 aiming to maximize the services scalability performance. Assessing scalability from utility 67 perspective is insufficient for the above purpose, as it works from an abstract perspective 68 which is not necessarily closely related to the technical components and features of the 69 system. 70 In this paper, we use technical scalability measurements and metrics for scalability [12] 71 of cloud-based software services, inspired by earlier technical measures of cloud elasticity 72 [13][14][15], this work is extended from previous works [12], [16]. We demonstrate the metrics 73 application using two cloud-based software services (OrangeHRM and/or MediaWiki) run 74 through the Amazon EC2 and Microsoft Azure clouds. We perform three comparisons, the 75 first one between the same cloud-based software service hosted on two different public cloud 76 platforms. The second comparison is between two different cloud-based software services 77 hosted on the same cloud platform. The third comparison is between the same cloud-based 78 software service hosted on the same cloud platform with different auto-scaling policies. We 79 show how the metrics can be used to show differences in the system behavior based on 80 different scaling scenarios. We discuss how we can use these metrics for measuring and 81 testing the scalability of cloud-based software services. 82 The rest of the paper is organized as follows: Section 2 presents related works. A 83 description of our approach to measuring the scalability of cloud-based software services and 84 our metrics based on this measurement approach are presented in Section 3. Section 4 85 presents our experiments and analyses using two different usage scenarios, and three sets of 86 comparisons to demonstrate the measurement approach and metrics results. Next, we discuss 87 the implications and importance of the approach and metrics in Section 5. Finally, we present 88 our conclusions and future works in Section 6. 89

90
Related reviews [17,18] highlight scalability and performance testing and assessment 91 for cloud-based software services, as promising research challenges and directions. Another 92 related mapping study [19] highlights that the majority of the studies in software cloud testing 93 present early results, which indicates growing interests across the field and also the potential 94 for much more research to follow the early results. 95 A relevant systematic literature review [5] covers cloud performance assessments and 96 metrics in terms of scaling, elasticity, and efficiency. Highlights of their key findings are: 97 most of the reviewed papers focus on elasticity, and in the term of scalability, they report that 98 the papers were either early and preliminary result or initial ideas of research students. The 99 review [5] provides the definitions of the key performance aspects (scalability, elasticity, and 100 efficiency) which have been adopted in this study. Other similar recent surveys [20, 21] focus 101 primarily on cloud service elasticity. 102 The majority of the studies focus on measuring the elasticity of cloud services from a 103 technical perspective [4,10,15,[22][23][24][25][26]. For example, Herbst et al. [4] sets a number of key 104 concepts that allows measuring cloud service elasticity in technical term (see Fig. 1) such as 105 the quantity and time extents for periods of time when the service provision is either below or 106 above what is required by the service demand. Elasticity measures defined by [4,22]  From the utility-oriented perspective of measuring and quantifying scalability, we note 118 the work of Hwang et al. [7,11]. Their production-driven scalability metric includes the 119 measurement of a quality-of-service (QoS) and the cost of that service, in addition to the 120 performance metric from a technical perspective [7,11]. This approach is useful from a utility 121 perspective, as it depends on multiple facets of the system (including cost measures), it is 122 improbable to be able to provide useful and specific information in terms of contribution of 123 system components to scalability in a technical perspective. 124 Technical-oriented measurements or metrics for cloud-based software scalability 125 research are limited. Such as [4] provides a technical scalability metric, however, this is a 126 rather elasticity driven metric which measures the sum of over-and under-provisioned 127 resources over the total length of time of service provision. While, Jayasinghe et al. [13,14] 128 provides a technical scalability measure in terms of throughput and CPU utilization of the 129 virtual machines, but the work does not provide a metric or measure. Jamal et al. [27] 130 describe practical measurements of systems throughput with and without multiple virtual 131 machines (VMs), without clearly formulating specific measurements or metric of scalability. 132 Gao et al. [15] evaluate software as services (SaaS) performance and scalability from the 133 capacity of the system perspective, by using the system load and capacity as measurements 134 for scalability. Another recent work [28] focuses on building a model that helps to measure 135 and compare different deployment configurations in terms of costs, capacity, and elasticity. 136 Brataas et al. [29] offered two scalability metrics, one based on the relationship between the 137 capacity of cloud software services and its use of cloud resources; the second is the cost 138 scalability metric function that replaces cloud resources with cost, in order to demonstrate the 139 metrics, they used CloudStore application hosted in Amazon EC2 with different 140 configurations. In an earlier work, [30] provides a theoretical framework of scalability for 141 mobile multi-agent systems, however, which remains limited to theory and modeling results. 142 In terms of comparisons, we note that [13,14]

155
Scalability is the ability of the cloud-based system to increase the capacity of the software 156 service delivery by expanding the quantity of the software service that is provided when such 157 increase is required by increased demand for the service over a period of time during which 158 the service is exposed to a certain variation in demand for the service (i.e. a demand scenario) 159 [5]. Our focus is whether the system can expand in terms of quantity (scalability) when 160 required by demand over a sustained period of service provision, according to a certain 161 demand scenario. We are not concerned with short-term flexible provision of the resources 162 (elasticity of the service provision) [22]. The purpose of elasticity is to match the service 163 provision with actual amount of the needed resources at any point in time [22]. Scalability is 164 the ability of handling the changing needs of an application within the confines of the 165 infrastructure by adding resources to meet application demands as required, in a given time 166 interval [5,32]. Therefore, the elasticity is scaling up or down at a specific time, and 167 scalability is scaling up by adding resources in the context of a given time frame. The 168 scalability is an integral measurement of the behavior of the service over a period of time, 169 while elasticity is the measurement of the instantaneous behavior of the service in response to 170 changes in service demand. Furthermore, we are not concerned with the efficiency of the 171 cloud-based software services delivery, which is usually measured by the consumption of 172 resources (i.e. cost and power consumption) required to complete the desired workload [5]. 173 The increase of cloud capacity usually happens by expanding the volume of service 174 demands served by one instance of the software or by providing a lower volume of service 175 through multiple instances of the same software, or a combination of these two approaches. 176 Generally, we expect that if a service scales up the increase in demand for service should be 177 matched by the proportional increase in the service's provision without degradation in terms 178 of quality. In this work, the quality of the service may be seen for example in terms of 179 response time. 180 The ideal scaling behavior of the service system should be substantial over a sufficiently 181 long timescale, in contrast with cloud elasticity that looks at short-term mismatches between 182 provision and demand. If the system does not show ideal scaling behavior, it will increase the 183 volume of the service without changing the quality of that service. Ordinarily, real systems 184 are expected to behave below the level of the ideal scaling and the aim of scalability testing 185 and measurements is to quantify the extent to which the real system behavior differs from the 186 ideal behavior. 187 To match the ideal scaling behavior, we expect that the system will increase the quantity 188 of the software instances proportionately with the rise in demand for the software services, i.e. 189 if the demand is doubled, we would ideally expect the base number of software instances to 190 also double. We also expect that the system maintains quality of service in terms of 191 maintaining the same average response time irrespective of the volume of service requests, i.e. 192 if demand was increased by 25%, we would ideally expect no increase in average response 193 time. Formally, let us assume that D and D' are two service demand volumes, D' > D. Let I 194 and I' be the corresponding number of software instances that are deployed to deliver the 195 service, and let tr and t'r be the corresponding average response times. If the system scales 196 ideally we expect that for any levels of service demand D and D' we have that In general, real-world cloud-based systems are unlikely to deliver the ideal scaling 227 behavior. Given the difference between the ideal and the actual system scaling behavior, it 228 makes sense to measure technical scalability metrics for cloud-based software services using 229 as reference the ideal scalability behavior defined in equations (1) and (2). 230 In terms of provision of software instances for the delivery of the services, the scaling is 231 deficient if the number of actual instances is lower than the ideally expected number of scaling 232 instances. To quantify the level of deficiency we pick a demand scenario and start with a low 233 level of characteristic demand D0 and measure the corresponding volume of software instances 234 I0. Then we measure the number of software instances Ik corresponding to a number (n) of 235 increasing demand levels Dk following the same demand scenario, we can then calculate how 236 close are the Ik values to the ideal I * k values (in general we expect Ik < I * k). Following the ideal 237 scalability assumption of equation (1) we get for the ideal I * k values: Considering the ratio between the area defined by the (Dk, Ik) values, k = 0,…,n, and the 240 area defined by the (Dk, I * k) values we get the metric of service volume scalability of the 241 system I: where A and A * are the areas under the curves evaluated piecewise as shown in Fig. 3A 246 calculated for actual and ideal I values and I is the volume scalability performance metric of 247 the system. The system is close to the ideal volume scalability if I is close to 1. If the 248 opposite is the case and I is close to 0, then the volume scalability of the system is much less 249 than ideal. 250 We define the system quality scalability in a similar manner by measuring the service 251 average response times tk corresponding to the demand levels Dk. Here, the system average 252 response time measures as the average time that the system takes to process a request once it 253 was received. We approximate the ideal average response time as t0, following the ideal 254 assumption of equation (2). The system quality scalability is less than ideal if the average 255 response times for increasing demand levels increase, i.e. tk > t0. By considering the ratio 256 between the areas defined by the (Dk, tk) values, k = 0,…,n, and the area defined by the (Dk, t0) 257 values we get a ratio that defines a metric of service quality scalability for the system t: where B and B * are the areas under the curves evaluated piecewise as shown in Fig. 3B 262 calculated for actual and ideal t values and t is the quality scalability performance metric of 263 the system. If t is close to 1 the system is close to ideal quality scalability. On the other hand, 264 if t is close to 0 the quality scalability of the system is far from the ideal.  (1)) and A is the shaded area under the blue curve, which corresponds to the actual 274 volume scaling behavior of the system. The blue curve is expected in general to be under the 275 ideal red line, indicating that the volume scaling is less efficient than the ideal scaling. In Fig.   276 3B, B * is the shaded area under the red line indicating the expected ideal behavior (see 277 equation (2)) and B is the area under the blue curve, showing the actual quality scaling 278 behavior of the system. Again, in general, we expect that the blue curve is above the ideal red 279 line, indicating that the quality scaling is below the ideal. We chose nonlinear curves for the 280 examples of actual scaling behavior (blue curves in Fig. 3) to indicate that the practical scaling 281 of the system is likely to respond in a nonlinear manner to changing demand. 282 The above-defined scalability metrics allow the effective measurement of technical 283 scalability of cloud-based software services. These metrics do not depend on other utility 284 factors such as cost and non-technical quality aspects. This allows us to utilize these metrics in 285 technically focused scalability tests that aim to spot components of the system that have a vital 286 impact on the technical measurability, and additionally the testing of the impact of any change 287 in the system on the technical system scalability. The scalability performance refers to the 288 service volume and service quality scalability of the software service; these two technical 289 measurements reflect to the performance of the scalability of the cloud-based software 290 services. 291 Applying these metrics to different demand scenarios allows the testing and tuning of the 292 system for particular usage scenarios and the understanding of how system performance can 293 be expected to change as the pattern of demand varies. Such application of these metrics may 294 highlight trade-offs between volume scaling and quality scaling of the system that characterize 295 certain kinds of demand pattern variation (e.g. the impact of the transition from low-frequency 296 peak demands to high-frequency peak demands or to seasonal change of the demand). 297 Understanding such trade-offs can help in tailoring the system to its expected or actual usage. nature of the applications, which is highly adopted by cloud and application providers. As the 306 architecture of these applications support REST caching to improve performance and 307 scalability; by caching the data and the code, which will reduce the amount of time required to 308 execute each HTTP request and therefor improving response times by serving data more 309 quickly [35,36]. 310 The purpose is to check the scalability performance of cloud-based applications using 311 different cloud environments, configuration settings, and demand scenarios. We applied the 312 similar experimental settings for the same cloud-based system (OrangeHRM) in two different 313 cloud environments (EC2 and Azure). We have changed the parameters for Mediawiki, which 314 runs a different type of instance on AWS EC2 environment.

360
The cloud resources must be adequately configured to measure up to the workload in order 361 to achieve efficient performance and scalability. We considered two demand scenarios as 362 shown in Fig. 2. The first scenario follows the steady rise and fall of demand pattern (see  In terms of quality scalability, the EC2 hosted system scales much better in the context of 428 the first scenario, steady rise and fall of demand, than in the case of the second scenario with 429 step-wise increase and decrease of demand. In contrast, Azure shows lower quality scalability 430 than EC2 in this respect, with the metric being 0.45 in the first scenario, and 0.23 for the 431 second scenario. Step-wise increase and decrease 0.5882 0.5201

Microsoft Azure
Steady rise and fall 0.6532 0.4526 Step-wise increase and decrease 0.5592 0.2372 We note from the values of both metrics I and t for both clouds that software system 434 performed better with respect to both volume and quality in the first scenario, steady rise and 435 fall of demand, which is more realistic and simpler demand scenario for many cloud-based 436 software services. In general, we conclude that OrangeHRM performed better in Amazon 437 EC2, in the terms of quality scalability, while performed slightly better in Azure in the terms 438 of volume scalability for the steady rise and fall demand scenario. In the case of the variable 439 rise and fall of demand, the OrangeHRM performs considerably better on the EC2 than on the 440 Azure.

441
The big difference in the average response times for the software system running on the 442 two cloud platforms indicates that either the software system is tailored better to the provisions 443 of the EC2 system or that the Azure might have issues with the speed of service delivery for 444 the kind of service software systems like the OrangeHRM (or for some particular kind of 445 technical aspect of this software system). Both options raise interesting questions and 446 opportunities for further investigation of the technical match between a software system and 447 the cloud platforms on which it may run. 448 449 We used different software configurations, hardware settings, and workload generator in  Step-wise increase and decrease 0.5882 0.5201

MediaWiki
Steady rise and fall 0.7556 0.9664 Step-wise increase and decrease

510
scaling policies 511 We used the same software configurations, hardware settings, and workload generator in 512 this set of experiments to measure the scalability of the two scenarios for the same cloud-513 based software services that have been hosted in EC2, with different Auto-Scaling policies.

514
The first set of policies are the default policies that are provided by EC2 cloud when setting 515 up an Auto-Scaling group (option 1). We pick out random scaling policies for the second set 516 of experiments (option 2). The Auto-scaling policies that have been used for this set of 517 experiments are given in Table 6. 518

Remove Instance
When 10% <= CPUUtilization > -infinity The purpose of this kind of comparison is to see the effects on the scalability performance 520 using the same cloud platform while using same types of instances and workload generators, 521 with different auto-scaling policies. The average number of MediaWiki instances (Option 2) 522 for both scenarios are shown in Fig. 9.A,B. The average response times of MediaWiki (Option 523 2) for both scenarios shown in Fig. 9.  Table 7 in italics. 537 In the term of average response time, we note that there are big differences in the average 538 of response times for the second scenario as it gradually from 2.035 seconds for demand size 539 100 to 9.24 seconds for demand size 800. While it graduates from 1.02 seconds for demand 540 size 100 to 3.06 seconds for demand size 800, for the second scenario-Step-wise increase 541 and decrease. 542 Here we use the quality scalability metric defined by considering the system average 563 response time. Alternative quality scaling metrics may be defined by considering other quality 564 aspects of the system such as system throughput or recovery rate [11]. Expanding the range of 565 quality measurements provides a multiple factor view of quality scalability to support the 566 trade-off options in the context of QoS offerings in the case of service scaling. 567 We understand the importance and need for utility-perspective scalability metric and 568 measurements. Therefore, our proposed metrics can be integrated into the utility-oriented 569 scalability metric proposed by Hwang et al. [11], by combining our metrics as the performance 570 and/or quality components of their utility-oriented scalability metric. This will allow the 571 analysis of the scalability of cloud-based software services from both technical and 572 production-driven perspectives. The utility oriented productivity metric (P()) is given as to a re-definition of the utility-oriented metric as: We calculated the integrated scalability metric (see costs in  Step-wise increase and decrease 23.18

OrangeHRM / Azure
Steady rise and fall 4.93 Step-wise increase and decrease 2.21 MediaWiki (Auto-Scaling policies option 1) Steady rise and fall 14.04

Scenario Integrated Metrics
Step-wise increase and decrease 7.15 6.92

MediaWiki (Auto-Scaling policies option 2)
Steady rise and fall 14.02 Step-wise increase and decrease 6.64 6.42 The technical scalability metrics that we used in this paper allow exploring in more detail Some interesting scalability behavior has been noted through the analysis, such as big 611 variations in average response time for similar experimental settings hosted in different clouds.

612
A case of over provision state has been accrued when using higher capacity hardware 613 configurations in the EC2 cloud. 614 We believe that the technical-based scalability metrics can be used in designing and 615 performing scalability testing of cloud-based software systems, in order to identify system 616 components that critically contribute to the technical scaling performance. We have shown the 617 integration of our technical scalability metrics into a previously proposed utility oriented 618 metric. Our metrics can also be extended, by considering multiple service quality aspects and 619 combined with a range of demand scenarios to support the fine-tuning of the system. Such