Modelling the Impact of Cloud Storage Heterogeneity on HPC Application Performance

: Moving high-performance computing (HPC) applications from HPC clusters to cloud computing clusters, also known as the HPC cloud, has recently been proposed by the HPC research community. Migrating these applications from the former environment to the latter can have an important impact on their performance, due to the different technologies used and the suboptimal use and configuration of cloud resources such as heterogeneous storage. Probabilistic models can be applied to predict the performance of these applications and to optimise them for the new system. Modelling the performance in the HPC cloud of applications that use heterogeneous storage is a difficult task, due to the variations in performance. This paper presents a novel model based on Extreme Value Theory (EVT) for the analysis, characterisation and prediction of the performance of HPC applications that use heterogeneous storage technologies in the cloud and high-performance distributed parallel file systems. Unlike standard approaches, our model focuses on extreme values, capturing the true variability and potential bottlenecks in storage performance. Our model is validated using return level analysis to study the performance of representative scientific benchmarks running on heterogeneous cloud storage at a large scale and gives prediction errors of less than 7%.


Introduction
High-performance computing (HPC) can be defined as the use of supercomputers to efficiently solve complex computational problems [1], while cloud computing is defined as ubiquitous and on-demand access to configurable computing resources [2].Although these two technologies were initially designed for different purposes, recent attempts to integrate them have given rise to the new concept of the HPC cloud.Netto et al. [3] defined the HPC cloud as "the use of cloud resources to run HPC applications".Both new challenges and new opportunities have emerged from this work.
From around 2009 onwards, some HPC users started to consider the cloud as a costeffective alternative to high-cost HPC clusters.The idea of a pay-on-demand business model that was flexible and provided more customer control over resources was tempting [4,5].Cloud computing allows for the flexible provision of computing resources such as CPUs, memory, storage, networks and graphics processing units.These resources are elastic, meaning that they can be scaled according to the requirements of each application.Instantaneous access to computational resources can help users to deploy or test their application when they need it [6].
The deployment of HPC applications over cloud computing clusters presents several challenges that have yet to be resolved.One potential problem concerns storage systems, as cloud clusters do not use the same types of storage systems as HPC clusters.Storage systems can also be considered an issue in the adoption of cloud systems for HPC applications [3,7,8].Anticipating the performance of large-scale applications in heterogeneous storage systems is challenging.Technical documentation on aspects such as the throughput and latency of storage technologies is not sufficient to predict the performance of HPC applications, given the impacts of multiple sources of interference.
In this paper, we describe a statistical model based on Extreme Value Theory (EVT) that can be used to characterise and predict the performance of HPC applications that rely on heterogeneous storage in cloud systems.Our paper makes the following contributions:

•
We present an EVT-based model for characterising the performance of HPC applications that make use of heterogeneous storage technologies in cloud computing systems.

•
We develop a method for predicting the performance of HPC applications based on return level analysis with different numbers of storage nodes, which can inform storage algorithms and hence improve the performance of applications.

•
We evaluate the proposed model by using it to predict the performance of HPC applications running on heterogeneous cloud storage at large scales.
This paper is organised as follows: Section 2 describes the main concepts associated with heterogeneous storage in cloud computing infrastructures, parallel file systems for supporting HPC applications in the cloud, and Extreme Value Theory.In Section 3, we describe our approach to modelling the impact of using heterogeneous cloud storage on the performance of HPC applications, and we validate this model in Section 4 using return level analysis.Section 5 describes some of the main related works.Finally, we present conclusions and suggestions for future work in Section 6.

Background
In this section, we describe the storage systems currently used in general-purpose cloud computing environments, and show how these can be leveraged efficiently in HPC.We then focus on BeeGFS as a flexible and scalable alternative for the deployment of a parallel file system that is optimised for HPC on cloud infrastructure.Finally, we describe key concepts related to the modelling of application performance using EVT.

Heterogeneous Storage Systems in Cloud Computing
In cloud computing clusters, the performance of data-intensive applications is limited by disk data transfer rates, among other factors.To mitigate the impact on performance, cloud systems that offer hierarchical and heterogeneous storage architectures are becoming commonplace.The integration of different storage alternatives such as solid-state drives (SSDs), hard disk drives (HDDs), or even RAMDISK (a block of RAM used as volatile storage) may improve the performance of applications by taking advantage of the characteristics of each type of storage [9].
Table 1 presents the storage technologies and services offered by the top three cloud providers.It can be seen that cloud providers offer a variety of heterogeneous storage devices, each of which has certain particularities.For instance, Azure provides disks with high throughput, high IOPS and low latency (Ultra disk), but also provides a low-cost disk with standard throughput, IOPS and latency (Standard HDD).In addition to a generalpurpose SSD, AWS provides a type of SSD with configurable IOPS (io2-io1), and another optimised for speed.Each of these disks is designed for special cases, and offers different capabilities such as volume size, maximum IOPS and maximum throughput.GCP also provides different types of disks and offers zonal and regional replication, as well as highperformance disks (Extreme Persistent Disk).The three main types of storage service are object, block and file storage, which make use of the available disk types.

Leveraging Heterogeneous Cloud Storage for HPC
Modern HPC systems are designed to provide high performance for scientific applications by offering low-latency networks such as InfiniBand and optimised distributed parallel file systems.LustreFS [10] and BeeGFS [11] are two of the most commonly used parallel file systems in the top500 HPC clusters.Both of these parallel systems are being increasingly deployed on cloud platforms [11][12][13], with the aim of achieving fast access to large amounts of data in cloud clusters.
The use of Lustre and BeeGFS helps to maximise the writing and reading process of large amounts of data in cloud clusters.In this work, we focus on the BeeGFS parallel file system, as it has certain features that make it particularly suitable for cloud environments, such as its simple installation.It is also available for different Linux distributions, is hardware agnostic, and supports high concurrency.An additional benefit is that it is unnecessary to host the management service on a dedicated machine.

BeeGFS Parallel File System
BeeGFS is a client-server model file system that was developed with a focus on performance and scalability.Figure 1 shows the architecture of BeeGFS, which is composed of three types of nodes: a management server , a metadata server, and an object storage server.The management server is the node in charge of the configuration of the BeeGFS file system and its other components; there is usually only one management server in a BeeGFS configuration.The metadata server contains the metadata target, a storage device that carries the structure of the file system and the file names.This server also manages indexes and namespaces.The object storage server is responsible for receiving the data sent from the client and storing it in the object storage target (OST).A given BeeGFS configuration may involve a high number of OSTs, and BeeGFS will try to store data efficiently by splitting the workload among them.
BeeGFS was developed with easy-to-use feature installation and management.Two of the most important aspects of this simple installation are that BeeGFS does not require a kernel patch, and that it comes with graphical Grafana [14] dashboards.BeeGFS also has a striping feature, and the chunk size, the numbers of storage nodes to use (targets) and the heterogeneous storage support can be specified.

Modelling Extreme Values
EVT is a field of statistics that relates to the behaviour of exceptional or extreme values of a set of random variables, i.e., those which deviate from the median of the probability distribution [15].EVT has been applied in areas such as statistical quality control [16], finance [17][18][19], transportation [20], study of calcium content [21] and in hydrology to calculate the probability of floods in a certain period [22][23][24].In computer science, EVT has been used to predict HPC applications' performance [25] and the impact of interference sources on HPC applications' performance [26,27] on HPC clusters.
Our work focuses on the impact of storage actions on HPC applications provisioned in cloud environments.Standard statistical analyses focus on average performance and may overlook significant outliers that impact HPC applications in cloud environments.Extreme Value Theory (EVT) is necessary to model these outliers and provide accurate predictions of worst-case performance scenarios.Measuring extreme execution times is crucial because it allows EVT to capture the system's true variability.Unlike traditional methods focused on averages, EVT prioritises these outliers, which can be hidden bottlenecks or signs of future issues.By understanding these extremes, we can proactively allocate resources for peak loads and prevent outages.EVT even allows us to quantify the likelihood of such events, enabling systems design with sufficient capacity.This focus on extremes, rather than just averages, ensures more reliable and efficient resource provisioning.
EVT was developed based on the results of work by Fisher and Tippett [28,29].These authors showed that the distributions of a set of random and independent, identically distributed (i.i.d.) variables with an unknown underlying distribution and outliers or extremes could converge to one of three possible asymptotic distributions: Frechet, Gumbel or Weibull.Jenkinson [30] introduced a generalised extreme value (GEV) distribution that combined these three distributions.Equation (1) shows the cumulative distribution (cdf) for the GEV distribution.
Equation ( 1) has three parameters: scale σ, location µ and shape ξ, where, −∞ < ξ < ∞, −∞ < µ < ∞ and σ > 0. In this equation, the scale parameter defines the dispersion of the distribution or the variability, whereas the location parameter defines where the distribution is centred on the real axis, and the shape parameter, also known as the extreme value index, defines whether the distribution is a Gumbel (ξ = 0), Fréchet (ξ > 0) or Weibull distribution (ξ < 0) [31] (see Figure 2).
To fit the GEV distribution, samples of extreme values from the measured values must be obtained.There are two common methods for carrying out this process: peaks over threshold (POT) and the block maxima method (BMM).In the POT approach, a threshold is established and samples are selected from above that threshold.BMM consists of dividing the measured values into blocks of size n and selecting the maximum value from each block.BMM has been widely used in hydrology, where the block size is set to obtain the GEV parameters for seasonal variation over several years [22].Three-parameter estimation has been widely studied in relation to GEV [32].The most well-known methods are maximum likelihood estimation (MLE) [33] and the method of L moments (LMOM) [34].MLE is the most frequently used method of estimating GEV parameters in conjunction with BMM.Its main advantage is based on the asymptotic normality and parameter estimation when there is a known prior distribution.According to Smith [35], to establish the asymptotic property of GEV and obtain suitable estimators, ξ must be a value in the range (−0.5, ∞).If ξ is in the range (−1, −0.5), estimators can be obtained but the asymptotic property is not established.Finally, if ξ is in the range (−∞, −1), estimators can not be obtained.
Once the GEV model has been determined and the parameters have been calculated, it is possible to calculate the return level values of the extremes.A return level is defined as the value that will be exceeded on average only once for every N samples, or for every N blocks of the distribution when using the block maxima method [36].We can calculate this return value as: In Equation ( 2), F represents the GEV distribution, and P is given by P = 1 − 1/i, where i, is the return period, defined as the average length of time between events with the same or a more significant value.Return level analysis has historically been applied more often in domains such as finance [37] and hydrology [38] than in computer science [26].

Modelling Heterogeneous Cloud Storage Impact on HPC Application Performance
This section describes our proposed stochastic model for the analysis of heterogeneous storage performance and the way in which we estimate the parameters for this model, which allows us to explore the impact of heterogeneous storage on HPC cloud systems.Unlike previous approaches, our model leverages Extreme Value Theory (EVT) to address the variability and extremities in storage performance without assuming any a priori distribution of the storage times.This novel application of EVT provides a more flexible and accurate representation of storage performance in parallel distributed systems such as BeeGFS, thereby offering new insights into the performance dynamics of HPC applications.

Modelling Approach
The storage time in distributed parallel file systems such as BeeGFS is dominated by the time taken by the slowest storage node in the cluster.We use EVT to model the data storage performance in this type of system.To do this, we assume that the time that a distributed parallel file system takes to store each chunk of a file is i.i.d.
We use BMM to model the extremes.The block and sample sizes must be large enough to obtain a good model fitting.Several methods have been proposed for testing the goodness of fit of a model [39,40]; these studies have presented approaches for the selection of the size of the blocks, using BMM and the sample, and have also suggested techniques for testing the asymptotic property using graphs or by analysing the values resulting from the estimation of parameters.We describe a rule of thumb for testing the asymptotic property in Section 2.3.

Estimating the Model Parameters
In our block maxima formulation, the block size is the number of storage nodes configured in BeeGFS, and we sample the value of the execution time from the slowest node.After selecting a set of block maxima, we need to estimate the parameters of the GEV distribution.Henwood et al. [41], suggested a minimum block number of 60; we use a block number of 100 and follow a method based on the normality test suggested in [40] to determine whether the values are normally distributed with a 95% confidence interval and to avoid unbiased estimations.We then apply MLE to estimate the GEV parameters.

Predicting Performance on Heterogeneous Storage
Predicting the performance of applications in cloud environments is essential to achieve efficient resource provisioning.We use return level analysis and our model to predict the performance of scientific applications using cloud infrastructure at scale.After obtaining estimations for the three GEV parameters, we can use return level analysis based on Equation (2), as described in Section 2.3.In our model, the return period in Equation ( 2) is the block number used in the block maxima method.

Validation of the Model
In this section, we present a validation of our proposed model using representative scientific benchmarks.We study the performance of these benchmarks running on heterogeneous cloud storage, and use our model to predict performance at larger scales.The results are validated against simulations and observed values.To collect the initial data to fit our GEV model, we configured a BeeGFS cluster consisting of eight nodes.We used a block size equal to the number of nodes to apply BMM and MLE for GEV parameter estimation.Subsequently, we expanded the cluster up to 16, 32, 64 and 128 nodes to collect performance data at those scales and compare against the results of the return level analysis.

Experimental Setup
For our experiments, we used nodes from CloudLab [42], a project of the University of Utah (Salt Lake City, UT, USA), Clemson University (Clemson, SC, USA), the University of Wisconsin Madison (Madison, WI, USA), the University of Texas at Austin (Austin, TX, USA), the University of Massachusetts Amherst (Amherst, MA, USA), and US Ignite (Washington, DC, USA).Each of the configured nodes had three available storage targets (RAMDISK, SSD, HDD) (see Figure 3).For these experiments, we used c220g2 nodes.Each of these nodes had two Intel E5-2660 v3 10-core CPUs at

Benchmarks
We selected several representative benchmarks (see Table 2) to validate our model.These benchmarks copy fundamental parts of applications and simulate their behaviour:

•
Iozone is a tool used for the analysis of file systems.It includes different operations such as write, read, re-write, and re-read, and allows for latency and bandwidth analysis.We use the write operation with a single stream measurement in order to allow BeeGFS to receive all the files as a single unit, and then split this over all the storage nodes.• Fallocate is a Linux function that is widely used as a benchmark for file systems, as it can preallocate new spaces for files of a specific size.This tool first analyses whether the required space is sufficient for the file system, and then reserves that space for the file if required.

Data Collection
For data collection, we followed an experimental methodology according to the recommendations in [49], to avoid wrong measures.Some of the recommendations are about record keeping, in which they recommend adding labels and metadata to the data collected to be able to quickly retrieve and check them later.We used four benchmarks (Fallocate, Iozone, BTC, and PIOS) and three types of storage (RAMDISK, SSD, and HDD), and aimed to ensure that each run of each benchmark had the same testing conditions in the cluster.We therefore developed a script to sample the execution times of the slowest node for each benchmark by repeating every run 100 times and collecting all the possible benchmark-tostorage times at each iteration.The process of benchmark-to-storage is the execution time per node, in this case the slowest node in our configuration that stores a piece of data.

Estimation of GEV Parameters
In order to estimate the GEV parameters for each benchmark, we used BMM and MLE with 100 blocks and a block size of eight.Table 3 shows the parameters obtained with this approach.
The results in Table 3 allow us to determine which of the three GEV types of distributions gives the best fit.The results indicate that all the distributions are the Weibull type (ξ < 0).The shape parameters in all cases are greater than −0.5, thus meeting the asymptotic condition for the GEV distribution.The runtime distributions of the selected benchmarks using BeeGFS on heterogeneous storage devices show similar tailedness, and therefore fit a Weibull distribution.Also, by checking Figure 4, we can see that although all are upper bound, BT_C on RAMDISK, BT_C on HDD, and PIOS on SSD are close to light tail (Gumbel).The skewness values show that all benchmarks except PIOS have the same location of the tail for the SSD and HDD experiments.PIOS has a left-tailed skewness for HDD, and a right-tailed skewness for SSD.This is an interesting outcome, as PIOS on SSD also has the highest coefficient of variation, indicating that this experiment has the most variation in its data; this generates more extreme values, making the prediction process more challenging.

Return Level Analysis
In this section, we use return level analysis to predict the performance of HPC applications on BeeGFS.Figures 5-8 show the return value, computed using Equation (2), for 16, 32, 64 and 128 BeeGFS object storage servers, respectively.These figures present the results of the return level analysis compared to the observed values.This allows us to predict the performance of the application at scale and to compare this to the observed performance.The largest prediction error was 6.64% and the smallest was 0.46%, for PIOS on 64 nodes with RAMDISK and BT_C on 32 nodes with SSD, respectively.
The variations in the return levels for 16 and 128 nodes were larger for BT_C on HDD, PIOS on RAMDISK and PIOS on SSD (See Figures 7c and 8a,b), which were the experiments with shape parameters closer to zero, which often indicates a higher variability in performance outcomes.For instance, the higher variability observed for BT_C on HDD could be due to the inherent slower performance and higher latency of HDDs compared to SSDs or RAMDISK.Similarly, the performance variability for PIOS on RAMDISK might be influenced by the memory management and data handling differences inherent to RAMDISK storage.
The return level values for Iozone did not produce differences as large as those for BT_C on HDD, PIOS on RAMDISK, and PIOS on SSD (see Figure 6).The return level behaviour of Iozone exhibits less variability than other tools because of its singlestream measurement approach and the efficient file distribution handled by BeeGFS.This leads to more consistent performance metrics, as the tool focuses on straightforward write operations without the added complexity of parallel I/O patterns as in PIOS or the computation and communication overheads introduced by BT.The inherent design of Iozone and its specific use case in this study (i.e., writing a single stream) contribute to its stable and predictable performance, reflected in the less variable return levels observed in the analysis.

Related Work
Cloud platforms are increasingly incorporating SSDs into their storage systems.Huang et al. [50] proposed a black-box model to predict the performance of SSDs in terms of the latency, bandwidth and throughput, by applying statistical machine learning algorithms.They evaluated their model using micro-benchmarks and real-world traces from online transaction processing (OLTP) applications, and recorded errors of 9% in the latency prediction and 1% for the bandwidth and throughput.Unlike our work, in which we consider different storage types in our model, this work focused on predicting performance in SSDs but not HDDs.
Mondragon et al. [26] presented a model for analysing the performance of bulk synchronous HPC applications based on the use of EVT.They used their model to characterise the impact of next-generation interference sources on applications and predicted the performance of applications at large scales.Their model obtained a prediction error of less than 7.4% for HPC applications running on HPC clusters.We also apply EVT to characterise and predict the performance of HPC applications using heterogeneous storage in cloud systems.Dominguez-Trujillo et al. [51] presented an approach for modelling variations in the performance of large-scale HPC systems using EVT through a study of the maximum length of the distributed workload time interval for bulk synchronous HPC applications, using parametric and non-parametric ping.This work focused on the variability generated by the hardware and software used in HPC clusters.Our work instead focuses on analysing and predicting performance variations for heterogeneous storage systems in cloud systems.
Another approach that used EVT was an analysis of the worst-case execution time (WCET) [52], where the authors studied the accuracy of EVT in detecting the WCET for several processes.This approach has also been used for CUDA kernel tasks [53] and the analysis of automotive applications in embedded safety-critical systems [54], but not for the performance of HPC applications using heterogeneous storage systems in the cloud.

Conclusions
An increasing number of researchers are considering the use of the cloud to run their HPC applications.This work contributes to the adoption of the cloud as a feasible environment for HPC applications by providing a model for HPC applications in cloud systems and the integration of a high-performance storage file system such as BeeGFS into the cloud.In this paper, we have presented an EVT-based model for analysing, characterising and predicting the performance of HPC applications that use heterogeneous storage technologies in the cloud.
Modelling and understanding the performance of HPC applications that use heterogeneous storage can benefit both cloud providers and cloud users.An accurate prediction of the performance of HPC applications that takes into consideration the storage performance can help cloud providers to offer this feature in their platforms, and can help cloud users to identify the resources they will need more specifically.
In addition to predicting storage performance, our extreme value model can be used as guidance to design intelligent data placement algorithms that consider heterogeneous storage infrastructure, increasingly incorporated by cloud providers, seeking to take advantage of the characteristics of each storage type as well as data locality and data access patterns.
Although our model obtained accurate results with low prediction errors, modelling applications that exhibit heavy-tail storage time distributions such as PIOS is challenging.In those cases, the model might benefit from the use of techniques such as statistical bootstrapping or smoothing sample extremes, in order to more precisely fit collected data to distributions using a small number of samples.
One future direction for this work would be to include in the model other storage devices such as the new NVRAM.The development of a model of the performance of HPC applications based on the use of heterogeneous storage in cloud systems represents only one piece of a larger challenge, and additional cloud resources or even multi-cloud resources could be integrated into the model in future work.
• BT was developed by NAS Parallel Benchmark (NPB) and is extensively used for testing HPC clusters.It solves a highly configurable block-tridiagonal (BT) problem using MPI.• PIOS is a test tool created to work as an I/O simulator on file systems.This tool simulates a load from many clients that generate I/O in file systems.Due to its parallel nature, PIOS can write to the same or different files at the same time, but in this study it is used to write only to a single file.

Figure 4 .
Figure 4. Density Plots (in seconds) for Block Maxima Sample.

Figure 5 .
Figure 5. Performance prediction for Fallocate on BeeGFS using heterogeneous storage.

Figure 6 .
Figure 6.Performance prediction for Iozone on BeeGFS using heterogeneous storage.

Figure 7 .
Figure 7. Performance prediction for BT_C on BeeGFS using heterogeneous storage.

Figure 8 .
Figure 8. Performance prediction for PIOS on BeeGFS using heterogeneous storage.

Table 1 .
Storage services offered by the top three cloud providers.

Table 2 .
Selected benchmarks for performance analysis.

Table 3 .
Estimated GEV parameters for block maxima sample.