Evaluating performance variations cross cloud data centres using multiview comparative workload traces analysis

How to evaluate the performance variations of large-scale cloud data centres is challenging due to diverse nature of cloud platforms. Classic methods such as profiling-based evaluating methods tend to only provide global statistics for a system compared with cloud tracing based approaches. However, existing tracing based research lacks a systematic comparative multiview analysis from architecure-view to job-view and task-view, etc.to evaluate cloud performance variations, together with a detailed case study. We introduce MuCoTrAna, a multiview comparative workload traces analysis approach to evaluate the performance variations of large-scale cloud data centres which assists the cloud platform performance managers and big trace analysts. The efficiency of the proposed approach is demonstrated via case studies in Alibaba 2018 trace and Google trace. The multifaceted analysis results of traces reveals the qualitative insights, performance bottlenecks, inferences and adequate suggestions from global view, machine view, job-task view, etc.


Introduction
How to evaluate the performance variations of large-scale cloud data centres (Daid et al., 2021;Guo et al., 2019;Javadpour et al., 2021;Yu et al., 2021;Zhao et al., 2021) is challenging due to diverse nature of cloud platforms. Classic methods such as profiling-based evaluating methods tend to only provide global statistics for a system compared with cloud tracing-based approaches. With increasing diversity and complexity of cloud platforms, diagnosing the performance variations across large-scale cloud platforms by analysing the workload trace of data is gaining attention from research community (Amvrosiadis et al., 2018;Ogbole et al., 2021;Xiao et al., 2021). Amvrosiadis et al. (2018) reported that one challenge in the cluster workload traces analysis is the diversity and hence only using trace from one platform can't show the generality of a technique.Through the analysis of Alibaba trace, Luo et al. (2021) found that most microservices are more sensitive to CPU than memory in Alibaba cluster. Our motivation is based on the urgent need to conduct a detailed analysis by comparing various traces cross platforms. In addition, diagnosing cloud platform performance bottlenecks can be classified into two schemes namely, inner-platform performance analysis and cross-platforms performance analysis. Despite extensive research in the areas of inner-platform performance analysis under single cloud platform (Yu et al., 2021), such as FaceBook trace (Facebook, 2010) and taobao dataset (Ren et al., 2013), the cross-platform quantitative deep performance comparative analysis from multiviews via traces analysis has been overlooked. Moreover, Amvrosiadis et al. (2018) reports that most traces are confidential and therefore cannot be used for comparative studies.
Two of the most representative cloud traces are the Google trace (Google, 2011) which is the classic trace collected in 2011 by Google; and the Ali2018 (Alibaba, 2018) which was released in December 2018. Although multiview learning is effective, it is reported in Amvrosiadis et al. (2018) that there exist hundreds of research articles on Google trace (Peng et al., 2018;Zhang et al., 2016) and Alibaba trace (Guo et al., 2019). However, an empirical study on diagnosing performance variations between Google trace and Ali2018 using a comparative multiview method from overall to detailed tasks is missing. Similarly, although the trace analysis tool is an effective tool for empowering developers to monitor system anomalies and analyse performance bottlenecks without a deeper understanding of the system, the trace analysis tools for the comparative analysis of traces are lacking.
We introduce MuCoTrAna, a multiview comparative workload traces analysis approach for diagnosing performance variations in large-scale cloud data centres. The initial idea of this work was presented in Ruan et al. (2019), which only demonstrates the problem and limited solutions for some initial analysis. In this paper, we systematize our method by extending the background ideas, providing a more comprehensive survey of the recent related work, deepening the design of the architecture and analysis process, and integrating more new interesting findings together with adequate discussion and suggestions. Secondly, we present a comparative empirical study on Google cluster and Ali2018 and reveal several diverse novel interesting findings. Our findings and discussion would be of great interest to cloud traces analysts and platforms managers as such details of practical nature have never been reported before. Performance variations from the global view of data structure, machines, jobs and tasks, etc., are carefully investigated. We also design a multiview comparative workload trace analysis tool that provides a user-friendly unified visual support for comparative, multiview, spatial and temporal analysis of large-scale cloud traces.
The remainder of this paper is organised as follows: Section 2 provides a brief overview of the related work. The design of MuCoTraAna is presented in Section 3. The case study findings and results analysis are carefully studied in Section 4. The design and implementation of the trace analysis tool are presented in Section 5. The paper is concluded in Section 6.

Related work
The efficiency of diagnosing performance variations based on trace analysis is determined by traces, relevant methods and applicable tools (Liu & Yu, 2018;Lu et al., 2017;Tang et al., 2021;Xu et al., 2021). In this section, we survey the recent related work on the above aspects.
In the view of the trace data source, Amvrosiadis et al. (2018) reported that most of the traces are still secret and cannot be compared. Google trace was collected in 2011 and Ali2018 was collected in December 2018. They are the most popular representative open-source traces which can be used to demonstrate how to diagnose the variations and evolution between cloud platforms.
In the view of the recent research progress in trace analysis, we classified the recent representative research papers in Google and Alibaba traces in Table 1. First, we analyse the research on Google trace. In the machine view, the existing researches focus more on the heterogeneity of machine configuration and the dynamics of machine resources, e.g. revealed the daily trend of machine events, and Reiss et al. (2012a) found that there were about 40% of the machines unable to be scheduled at least once. In the tasks view, Reiss et al. (2012b), Ruan et al. (2019), Reiss et al. (2012a) and Di et al. (2012) revealed the long tail phenomena in terms of the job scale. For the execution process of jobs and tasks, the existing research reveals that there is a relationship between the trends of the different states of tasks. Liu et al. (2016) proposed that adding machines will reduce the failure rate of the jobs. Gao et al. (2020) analysed the Google trace and proposed a failure prediction algorithm based on a deep learning method to identify task and job failures in the cloud. Reiss et al. (2012a), Reiss et al. (2012b) and Ruan et al. (2019) reported that a few jobs consumed the major amount of the available resources. In the resources view, Reiss et al. (2012a), Reiss et al. (2012b), Chen et al. (2014) and Alnooh and Abdullah (2018) studied the relationship of the CPU and memory of jobs and tasks. Guo et al. (2019), Shan et al. (2018) and Liu and Yu (2018) reported that the modern data centres have the disadvantage of low resource utilisation, which is the main performance bottleneck. Reiss et al. (2012b) investigated the factors that affect the efficiency of system resources. However, there are still some shortcomings in the existing researches. First of all, they did not represent systematic multiview features and the analysis method to diagnosing performance variations based on trace. These studies are also lack of clarity on the resource efficiency of Google trace. For Alibaba trace, it is worth noting that a previous version, called Ali2017, of Ali2018 was released by Alibaba company in 2017. Existing research on Alibaba trace focuses on workload analysis and resource utilisation (Guo et al., 2019;Liu & Yu, 2018). Liu and Yu (2018) made a introduction to Ali2017 from batch job, container and server views. Resource utilisation and performance of Alibaba data centres are studied. Based on our survey, we found that like Google, Alibaba also divides jobs into many tasks, has duplicate submissions of jobs and tasks, keep daily patterns of workload, low resource utilisation and so on Guo et al. (2019). A few papers have made a comprehensive deep comparison between Google trace and Alibaba trace. For example, Shan et al. (2018) point out only around half of the CPU and memory are utilised in both Google and Alibaba cluster. Guo et al. (2019), Liu and Yu (2018) and Cheng et al. (2018) also raised this serious performance bottleneck problem. Everman et al. (2021) exposing the problem of insufficient CPU resources for the jobs in Alibaba trace. However, these papers only compare the two traces in terms of workload and resources, they lack a multiview perspective and no tool support. Moreover, their comparison is relatively imbalance with one trace of Google trace or Alibaba trace.
Trace analysis tool is an effective tool to alleviate the challenge of trace data surging, and enabling developers to monitor system anomalies and analyse performance bottlenecks without the detailed information about of the architecture, components, etc. Doray and Dagenais (2017) proposed a tool for highlighting multiple executions of the same task by simplifying the trace analysis process to help the trace analysts to have a deeper understanding of the system. Adhianto et al. (2010) developed a tool for measuring and analysing the performance of compute nodes in HPC clusters. Giraldeau and Dagenais (2015) introduced a new tool that can display the task execution process in the form of a graph model for many tasks in the distributed system. The above tools pay more attention to the tasks in the system, but cannot reflect the other information such as machines and users views. Although Giraldeau and Dagenais (2015) visualised the task execution state, it is limited to the temporal state of the task. In the cloud trace, usually multiple tasks in the job are executed on different nodes, which can reflect the spatial state. Balliu et al. (2016) proposed a tool for big data trace analysis BiDAl (Big Data Analyzer) and apply it only to Google trace but multiple traces. Versluis et al. (2020) surveys the use of workflow trace and introduce the Workflow Trace Archive(WTA), which supports sharing workflow traces in a common, unified format. Fernandez-Cerero et al. (2020) presented a model to automatically retrieve execution workflows of existing data-centre logs by employing process mining techniques which facilitated the analysis of workflows in the data centre.However, other related prototype systems only focus on a certain view of information in the trace, such as workflow, and do not analyse the information in the trace from a multi-dimensional perspective. Compared with these tools, our tools are user-friendly and unified visual support for comparative, multiview,spatial and temporal analysis of large-scale cloud trace.In addition, our tools can help compare the characteristics of different clusters and effectively discover the shortcomings of cloud computing clusters.
To sum up, for diagnosing the performance variations, George Amvrosiadis et al. (2018) suggested that attention should be paid to diversity in trace analysis. However, recent works focus more on inner-platform performance analysis but cross-platforms. Although recently some of them compared two kinds of traces, but their comparison is not deep and balanced enough in multiview comparative study. There lacks a comparative approach which provides a guidance for the system performance analysts. And the trace analysis tool which can support the trace comparative study are even rare. Therefore, we propose to diagnose the cloud platform's variations by comparatively studying the cloud clusters and proposes a tool together with the methods.

Design of multiview-based trace data analysis approach-MuCoTrAna
In this section, we introduce MuCoTrAna which diagnoses the cloud performance variations based on a multiview comparative trace data analysis. Section 3.1 introduces the multiview based trace analysis model. The corresponding analysis process is presented in Section 3.2.

Multiview-based trace analysis model
Classic methods like profiling based diagnosing methods tend to only provide global statistics for a system compared with the Cloud tracing approach. Multiview analysis and learning is effective in other fields. However, existing tracing based research lacks a systematic comparative multiview analysis approach. Therefore, in this section, we introduce a multiview comparative analysis performance variation diagnosing architecture -MucoTrAna.
In cloud trace analysis, the multiview analysis is a cloud trace data analysis from multiple trace feature sets. The cloud cluster trace contains implicit features about tasks, resources, jobs and failures among trace tables. Through the multiview cloud trace analysis, performance variations patterns, e.g. architecture, job scale, resource utility, can be discovered for performance optimisation. To diagnose performance variations between large-scale cloud datacenters, the first challenge is how to organise such multiview data and extract the features from them.
We propose a performance variation diagnosing framework between cloud data centres based on multiview comparative workload analysis (see Figure 1). Based on this framework, we need to establish the mapping by analysing the cluster trace, extract the spacial and temporal distribution features of traces, establish the global views and subviews and then provide the new findings. We diagnose the performance variations by providing the inferences and performance improvement suggestions.

Multiview-based trace analysis process
Our multiview-based trace analysis process is presented in Figure 2. To make it clear, we highlight the main steps as follows.
(1) Based on the overall view, the trace statistics and data structures are analysed from the trace log database.
(2) The spatial and temporal feature views are established based on the trace features and the trace analyst and application expert' knowledge. For example, the machine, job,  (4) The spatial and temporal analysis from multiple views is performed based on the performance optimisation goals. (5) The trace data analysis results are Visualized using the trend analysis, tag cloud analysis, etc. (6) The performance variations between different traces are analysed by the carefully comparisons. (7) The new findings and careful inferences about the performance variations based on the qualitative and quantitative analysis are presented.
To better understand the process, a case study together with the results obtained from Google trace and Ali2018 is shown in Section 4. All of the data analyses and new findings are presented based on these two publicly released data sets. Although these two publicly released data sets cannot represent the comprehensive performance trace sets for all cloud platforms, our analysis methods and the finding results can still provide the system analysts a meaningful quantitative scientific deduction method on evaluating the performance of the whole system in the form of a miniature. We believe that the analysis methods and new findings will surely benefit not only the research but also the practitioners' side about the Cloud performance variations diagnosing, performance improving and trace analysis.

Case study findings and results analysis
Under the proposed framework and analysis process in Section 3, the data table of Google and Ali2018 according to the global view detailed in Section 4.1 are clarified. From Section 4.2 to 4.4, the Machine analysis, Job analysis and Task analysis will be shown as an example.

Global view-based data tables classification
The global view-based analysis is performed based on the overall statistics (especially the difference and similarity of scales in the different views) of the traces to be compared for performance variations.
The trace data tables in Google trace and ali2018 according to the views are analysed to establish the mapping relationship based on the framework ( Figure 1). As is summarised in Table 2, many interesting findings can be inferred as the following.
• From the sampling day view, Google trace is nearly four times of data per sampling day than Ali2018. We infer that Google trace is somewhat better for those administrators who care about investigating the longer-time system performance analysis based on the trace view than the Ali2018. • The number of machines on Google's cloud platform is more than three times that of Ali2018. We may infer that the sampling Google platform has bigger machine scale than that of Alibaba cloud. • The number of tasks per job in Ali2018 is ten times smaller than that in Google trace. We also infer that compared with Google trace, the jobs in ali2018 are divided into smaller granularity tasks on average. • The information of cluster users can be directly characterised in Ali2018, but Google trace lacks this feature. Such customisation of cluster users may show that the design of Alibaba facilitate its performance based on the users's features.

The machine-view-based analysis
Machines being the fundamental hardware-level components whose configuration, and events, and so forth, are critical to the cloud platform's performance and thus the focus of our work for diagnosing performance variations.

Machine configuration analysis
In this section, we first investigate the machine configuration. Platform is one important view in large-scale Cloud clusters. We summarise the preliminary statistics on Table Machine events. We can identify that there are four platforms, which are four opaque strings in the Google trace table, named as platform A/B/C/D by us for convenience as shown in Table 3. The main steps of our approach are first traverse the Machine events table; then deduplicate the Machine events; count out the number of machines by counting the records; finally, the amount of memory resources is obtained from the memory field. Table 3 shows the resources usage from the view of the platform in Google trace.
From Table 3, we find that there are no cases where the machine switches the platform multiple times in the Google platform. In the publicly released traces, the memory capacity data is a normalised value. Thus, all of the memory capacity data have no unit of measurements. In order to study Google's machine resource configuration, we analyse the CPU capacity of all machines in Google trace which has been normalised into three types: 0.25,     Figure 3 and Table 4. In Google trace, most machines (92.66%) are configured with 0.5 CPU capacity, and about half of machines (53.50%) are configured with memory resources in [0.25,0.5) ( Table 4). We can infer that in Google trace the CPU and memory capacity of the machine are different. What's more, 53.48% of the machines have the same resource capacity configuration with CPU capacity of 0.5 and memory capacity of [0.25,0.5].
In Ali2018, there are 4043 machines with unique machine_id. By analysing the CPU and memory capacity of Ali2018, we focus on whether Ali2018 has a fine-grained resource management model like Google trace.
From Table 5, we choose the record mem_size and the record cpu_num as the memory capacity index and the CPU capacity index, respectively. We found that the CPU capacity of each machine is 96 cores, and the normalised memory capacity is 100. In Table 5, We can infer that Ali2018 with the same resource capacity configuration may manage CPU capacity and memory resources more coarsely than Google trace with different resource capacity configuration.

Machine events
Add, Remove and Update are three types of machine events in Google trace (Figure 4). The Add event occurs when the machine joins a cluster and can perform task. The number of Add records is far more than the other two types of records. Remove event refers to machine failure. There are a total of 8,957 Remove records in Google trace. The Update records are changes to available resources. According the difference of the number of records between Add and Remove, we find that some machines rejoin the cluster after being deleted when running on Google platform. After these statistics, we found that most but not all of the removed machines were need to be added again. 5,141 machines have removal records, and 5,097 machines were removed and again joined to the cluster. The most representative one is the machine with ID 4246147567. It has been performed of 164 times multiple additions and removals. And the machine with ID 6201459631 is 94 times. We record the machine usage time as T use , the machine removal time as T remove , and the total time as T,M is the machine time usage rate. Here T takes the value of the time of the last record in the machine events table.
We calculated the M values of the above two machines to be 27.18% and 57.48% by Equation (1). From this we can know that frequent removal and addition of machines affects machine utilisation. Figure 5 shows the trend of three machine events over time in Google trace.Events here represents cumulative events per day. By the analysis of adding and removing records for each machine, we can conclude that 99.14% of the machines are removed and then rejoined into the cluster. On average, each machine is removed and added at a time interval of 4.88 hours, so the line trends of Remove and Add are consistent in Figure 5.  There are only two states, Using and Import in Ali2018. The majority state of the machine is Using, and only the machine numbered 2739 has produced 5 records with the state of Import.
Finding 1: Ali2018 has more coarse-grained resource and event management mechanism than Google trace. Ali2018 is homogeneous, while Google trace is heterogeneous in terms of resource capacity configurations.
Because we know that with the increase of platform scale and application complexity, system software tends to adopt more fine-grained management. We suggest that Alibaba can improve its machine and event management module to support more fine-grained management in its next-generation platform, so as to optimise its performance.

The jobs and tasks view-based analysis
Job and task scale, scheduling time, resource usage, etc., are key factors to determine the system performance. This section shows our method to analyse the above factors in Google trace and Ali2018. Figure 6 presents the scale of jobs of various sizes in Google trace and Ali2018. It can be seen in Figure 6(a,b), about 75% of the jobs in Google trace contain only one task, while jobs with less than 500 tasks account for more than 98% of the total jobs. From Figure  6(c,d), in Ali2018, 40% jobs in the total of 14 million jobs have only one task, while 99.6% of jobs contain less than 30 tasks. i.e. compared with Google trace, the jobs in Ali2018 tend to be composed of small tasks. We then analyse the task scale by establishing task-scale tag clouds. The job-scale tag cloud for Google trace is depicted in Figure 7. The value of the number represents the job set whose elements are the special job that has the same amount of tasks as the value. E.g. 1 represents the job set whose element is the special job that is composed of single task. 15 represents the job set whose element is a job that is composed of 15 tasks. The size of the number represents the number of job sets whose element contains the same amount of tasks. e.g. The size of 1 in Figure 7 represents there are 506,648 jobs in the Google trace where each job has 1 task. The size of 10 in Figure 7 represents there are 2025 jobs in the Google trace where each job has 10 tasks. From the above comparison, we can find that the size of 1 is much bigger than 10. From Figure 7, we can find that most jobs contain only one task, because the other numbers (like 15,2) are much smaller than 1. It can be also seen that the number of tasks concentrates more on round numbers such as 2, 15, 50, 100, and 1000 in Figure 7. Most of the static task dividing method is based on an integer division dominant mechanism in the Google trace because most of the jobs contain integer (10/100/1000/ 10,000, ect.) tasks.

The job-scale analysis
Finding 2: The majority of the jobs in Google trace and Ali2018 contain only a small amount of tasks. In Google trace, 75% of jobs in Google trace contain only one task, and about 98% of jobs contain less than 500 tasks. In Ali2018, about 99.6% of jobs contained less than 30 tasks. The manual static task dividing method is a major mechanism.

Scheduling time
We record the length of time between task submission and task scheduling as T scheduling time , the timestamp of the task submission is T submit , the timestamp of task scheduling is T scheduling .
This section mainly explores the distribution of scheduling time in Google trace and the factors that affect scheduling time.
• The relationship of task distribution and scheduling time for scheduling classes Table 6 studies the relationship between scheduling class and scheduling time. We provide an algorithm to find the maximum scheduling time by traversal, as shown in Algorithm 1. Its time complexity is O(n). In our research, it needs to run four rounds in total because there are four scheduling classes in Google trace. Initially, we count the total number of tasks for each scheduling class and use the maximum time(MT), average time(AT), less than one second progress(L1) and more than 1000 s progress(M1000) as indicators to study the relationship between scheduling time and the task distribution. Taking the column with the scheduling class 0 as an example, L1 is 6.91% means that 6.91% of the tasks whose scheduling class is 0 have a scheduling time shorter than 1s. Similarly, for the tasks with scheduling class 0, 28.23% of the tasks scheduling time is shorter than 2s and 62.93% for less than 5s, 2.21% more than 1000s. Figure 8 reflects the relationship between the scheduling time and the distribution of tasks for each scheduling class. From the results in Table 6 and Figure 8, we can draw the following conclusions: (1) -The maximum and average scheduling time for tasks whose scheduling class is three is the smallest.
(2) -When the scheduling time is short, compared with tasks whose scheduling class is 0, 1, or 2, the proportion of the tasks whose scheduling classes are three is smaller. However, the scheduling time of the tasks whose scheduling class is three is less than 1000 s. It can be concluded that, the higher the scheduling class, the lower the maximum scheduling time and also lower is the probability that the task will starve to death. On the other hand, the average scheduling time is not inversely proportional to the scheduling class.
• The relationship of scheduling time and task priority Table 7 studies the relationship between priority and scheduling time. We first count the total number of tasks for each priority, and use the six indicators to study the scheduling time distribution. According to the priority value, the priority is divided into five types, which are free ( 0-1), other ( 2-8), normal production (9), monitoring(10), and infrastructure (11). From Table 7 and Figure 9, we have the following findings: (1) -Based on the analysis of the maximum time and average time, we can find that except for the tasks with infrastructure priority, the other tasks with relatively lower priorities like 0, 1, 2 have longer scheduling time than the others. (2) -Although the tasks with priorities from 2-8 belong to the same priority class other, the scheduling time of tasks whose priorities are 3, 5, 6, 7 differs greatly from those tasks whose priorities are 2, 4, 8. The tasks whose priority is 3, 5, 6 or 7, finish their  scheduling in such a short time that their scheduling time is all within 1000 s. In contrast, the tasks whose priority is 2, 4, or 8, finish scheduling with longer-time values and hence shortest maximum scheduling time is 16,324 s. (3) -The tasks whose priority is 11 have a maximum scheduling time long up to 20 days while the total sampling time for the whole data set is only 29-days. The average scheduling time of priority 11 is more than 8 h. Moreover, 62.7% of the tasks, whose priority is 11, have a scheduling time which exceeds 1000 s. (4) -It is worth mentioning that the scheduling time follows Pareto's law. As Figure 10 is shown, more than 90% of task scheduling time is shorter than 200s. It means that the scheduling algorithm of Google cluster is relatively effective.
Considering the above findings, we can deduce that it is opposite to our common intuition that the scheduling time of tasks with higher priority is shorter than those with the  lower priority. For example, the task with the priority value of 8, 11 has a higher priority but their scheduling time is not short. Figure 11 calculated the number of tasks submitted by Google trace and Ali2018 within 24 h. From 1 a.m. to 11 a.m., the number of jobs submitted by Google trace is significantly higher. The number of jobs submitted at 2 a.m. is five times that of jobs submitted at 2 p.m. We conclude that the workload on Google trace is unbalanced at different times on the same day. Ali2018 starts at 12am. The number of job submission remained at a high level for 9 h per day. We can deduce that the workload of Ali2018 is uneven. Improving some timeinsensitive tasks's spacial locality will improve system performance in both Google trace and Ali2018.

Daily patterns of workloads
Finding 3: The workloads of Google trace and Ali2018 vary greatly at different times of the day. The number of tasks submitted by Google trace during peak hours is five times that of idle time. The workload of Ali2018 has maintained a high level of 9 h in 24 h.
From this finding, we think the reason for submitting busy tasks during peak hours can be diagnosed, so that the corresponding measures, such as submission and workload balancing algorithm, can be applied to the platform.

Task failure analysis
Task failure analysis is an important way to analyse the cause of task failure, improve scheduling and resource allocation, and system efficiency. Although Jassas and Mahmoud (2018) suggests that only 35.91% of the tasks would be successfully completed in the end, it did not report the failure ratio. As can be seen from Figure 12,more than 30% of the jobs failed, including jobs that are evicted, killed, lost or failed. This finding shows that Google trace has a high failure rate of jobs and tasks. Similarly, authors in Garraghan et al. (2014), Chen et al. (2014), Jassas and Mahmoud (2018) and Rosa et al. (2015) performed an analysis of the failure of Google tasks. They mainly analyse the aspects of scheduling class, priority and resource usage. In this paper, in addition to studying the relationship between scheduling classes and priorities for task failure rate in the above sections, we also studied the relationship between machine and task failure. Figure 13 shows the relationship between various scheduling class and task event type ratio. We observe that for the tasks whose scheduling class is 3, the task completion rate  is approximately zero. In addition, the kill rate of the tasks is the largest for the tasks whose scheduling class is 2. The failure rate of the tasks reaches the maximum for the tasks whose scheduling class is 3, which also indicates that the scheduling class will affect the task failure rate. Therefore, we can suggest that in Google's task scheduling, more attention should be paid to tasks whose scheduling class is 3. Figure 14 shows the relationship between various event type ratios and task priorities. The task failure rate first decreases with the increase of priority and then rises. Similar to the scheduling time, the task with the priority classed as other has the lowest task failure rate, and the task with a priority of 5 has a task completion rate of 1. It is worth noting that all submitted tasks with a priority of 7 are killed. We speculate that tasks with too low a priority are more likely to fail when competing for resources resulting in higher failure rates. Tasks with a high priority are highly sensitive to time resulting in higher failure rates.

The temporal and spatial characteristics of tasks
In addition to the characteristics of the task itself, the spatial and temporal characteristics also affect the failure rate of the task. Figure 15 shows the task failure rate on 12,583 machines in Google trace within 29 days. We can find that Google trace has a high task failure rate on the 2nd and 10th to 17th days. We deduce that this phenomenon is related to the surge in the number of tasks submitted. This shows that the workload of each machine in the Google cluster has a similar trend. We can propose that if we want to improve the task completion rate, we can submit the batch task in the idle time, and in the event of a surge in workload, we can increase the amount of machines or resources in time so as to reduce the task failure rate.

The summary of the important findings
For the convenience, we summarise all of the findings from the global views in the following.
• From the sampling day view, Google trace is nearly four times of data per sampling day than Ali2018. We infer that Google trace is somewhat better for those administrators who care about investigating the longer-time system performance analysis based on the trace view than the Ali2018. • The number of machines on Google's cloud platform is more than three times that of Ali2018. We may infer that the sampling Google platform has a bigger machine scale than that of Alibaba cloud. • The number of tasks per job in Ali2018 is ten times smaller than that in Google trace. We also infer that compared with Google trace, the jobs in ali2018 are divided into smaller granularity tasks on average. • The information of cluster users can be directly characterised in Ali2018, but Google trace lacks this feature. Such customisation of cluster users may show that the design of Alibaba facilitates its performance based on the users's features.
We summarise the findings from the subviews in the following: • Finding 1: Ali2018 has more coarse-grained resource and event manage-ment mechanism than Google trace. Ali2018 is homogeneous, while Googletrace is heterogeneous in terms of resource capacity configurations. • Finding 2: The majority of the jobs in Google trace and Ali2018 contain only a small amount of tasks. In Google trace, 75% of jobs in Google trace contain only one task, and about 98% of jobs contain less than 500 tasks. In Ali2018, about 99.6% of jobs contained less than 30 tasks. The manual static task dividing method is a major mechanism. • Finding 3: The workloads of Google trace and Ali2018 vary greatly atdifferent times of the day. The number of tasks submitted by Google traceduring peak hours is five times that of idle time. The workload of Ali2018has maintained a high level of 9 h in 24 h.

A multiview comparative analysis based trace analysis tool
We present a design of prototype tool(MuCoTrAna) to provide an unified visual process interface for comparison and analysis of various traces on multiview, spatial, and temporal levels. MuCoTrAna offers trace analysis as well as diagnosing performance variations among large-scale Cloud data centres based on a multiview comparative workload. In our design idea, cloud performance variation diagnosing tool via trace analysis (e.g. MuCoTrAna) should consist of four layers ( Figure 16): Cloud data centres layer, Trace DB layer, Trace Analysis Engine layer, and GUI layer.

Cloud data centres layer
Cloud data centres layer is the source of Trace DB layer. It provides detailed chronological workload traces sources.

Trace DB layer
Trace DB layer contains different traces from various cloud data centres. It stores not only classic traces like Google trace, Alibaba trace 2017, and Alibaba trace 2018, but also some self-defined traces, which aims to enhance the extensibility of the tool in the data source level.

Trace analysis engine layer
Trace Analysis Engine layer is the core of MuCoTrAna, which includes basic operation and algorithms analysis. The basic operation on traces includes but not limited to open, edit and save traces. The analysis algorithms support multiview and comparative analysis. This engine can execute analysis operations from a single trace to multiple traces. Our tool can operate in three comparative modes: single vs single, single vs another, and single vs multiple. Single vs single compares different attributes in the same trace, i.e. inner-platform performance analysis. Single vs another means to compare the similar attribute between two different traces. While single vs multiple means to compare the attributes in more than three traces. Besides, we incorporate prediction and scheduling functions in the tool.

The GUI layer
The GUI layer is a user-oriented interface of the tool. Through the graphical interface, the user can easily choose the trace and analyse the trace through mouse clicks. The results of the analysis are displayed in spreadsheets or graphs.
We perform a performance diagnosing case study based on MuCoTrAna in Figures 17and 18. Figure 17 shows the samples of the Ali2018 and Google trace are compared from different aspects. Our tool can also visualise the comparative results for the traces as shown in Figure 18. From the case studies, we show that MuCoTrAna provides support for user-friendly multiview comparative analysis.

Conclusion
We proposed a multiview comparative analysis method and performed a case study on two classical traces: Google trace and Ali2018. Results show that the method can help to point out the good design, bottlenecks, etc., from the architecture-level to machinelevel. The comparative empirical study findings can provide insights on how to feature the bottlenecks in cloud big traces. It was found that most of the cloud platforms besides the traces studied in our case study also have the same data items with different item names though and can describe its architecture and resources. We found some interesting findings such as 75% of jobs in Google trace contain only one task, and about 98% of jobs contain less than 500 tasks. In Ali2018, about 99.6% of jobs contained less than 30 tasks. Moreover, our multifaceted analysis and new findings contribute new views about the research in the field of big workload trace data comparative analysis. The current version of our trace prototype tool provides the design where each module is communicated by the computing result value and the output figures. The further improvement of our algorithms, the unified seamless integration of all the modules with the total automatic execution of our big trace analysis process are on our on-going and future work list. The detailed correlation analysis of internal features and external features together with their influence on the performance variations are promising research direction,too.