A Review of Data Centers Energy Consumption and Reliability Modeling

Enhancing the efficiency and the reliability of the data center are the technical challenges for maintaining the quality of services for the end-users in the data center operation. The energy consumption models of the data center components are pivotal for ensuring the optimal design of the internal facilities and limiting the energy consumption of the data center. The reliability modeling of the data center is also important since the end-user’s satisfaction depends on the availability of the data center services. In this review, the state-of-the-art and the research gaps of data center energy consumption and reliability modeling are identified, which could be beneficial for future research on data center design, planning, and operation. The energy consumption models of the data center components in major load sections i.e., information technology (IT), internal power conditioning system (IPCS), and cooling load section are systematically reviewed and classified, which reveals the advantages and disadvantages of the models for different applications. Based on this analysis and related findings it is concluded that the availability of the model parameters and variables are more important than the accuracy, and the energy consumption models are often necessary for data center reliability studies. Additionally, the lack of research on the IPCS consumption modeling is identified, while the IPCS power losses could cause reliability issues and should be considered with importance for designing the data center. The absence of a review on data center reliability analysis is identified that leads this paper to review the data center reliability assessment aspects, which is needed for ensuring the adaptation of new technologies and equipment in the data center. The state-of-the-art of the reliability indices, reliability models, and methodologies are systematically reviewed in this paper for the first time, where the methodologies are divided into two groups i.e., analytical and simulation-based approaches. There is a lack of research on the data center cooling section reliability analysis and the data center components’ failure data, which are identified as research gaps. In addition, the dependency of different load sections for reliability analysis of the data center is also included that shows the service reliability of the data center is impacted by the IPCS and the cooling section.


E CPU
Probability of intended value of room temperature when the cooling system fails. P Correction factor of the server power consumption model.

P active
Active state power consumption of the server.

P base
Base power consumption of the server.

P comp
Combined CPU and memory average power usage.

P CPU
Power consumption of the CPU. P CRAC cool Power consumption of CRAR unit.

P disk
Power consumption of the disk drivers. P Fan j Total power consumption of the local fans.

P fix
Fixed power consumption of the server and the cooling system.
Power consumption by the input/output peripheral slot. P mb i Total power consumption or conduction loss of the mainboards.

P mem
Power consumption of the memory units. P net dev Power consumption of the network devices. P NIC Average power consumption of the network interface card. P idle PDU Idle power loss of PDU. With the development of cloud based services and applications, the commercial cloud service providers like Google, Facebook, or Amazon are now deploying massive geodistributed data centers. According to a research conducted by the International Data Corporation (IDC), the global demand for the data transfer and digital services is expected to be doubled to 4.2 Zettabytes per year, equivalent to 42, 000 Exabyte by 2022 [1]. The number of data centers is increasing globally to handle this rapidly growing data traffic, while the energy demand of the data centers is also increasing. According to [2], the US data centers handled about 300 million Terabyte of data that consumed around 8.3 billion kWh per year in 2016, hence 27.7 kWh per Terabyte with a carbon footprint of approximately 35 kg CO 2 per Terabyte of data.The Data Center Frontier has mentioned in a report that, the number of servers in data centers was increased by 30% during 2010 − 2018 due to the growing demand of computational workloads [3]. With the growing number of servers, the number of computational instances including virtual machines running on the physical hardware was raised by 550%, the data traffic was climbed 11-fold, and the installed storage capacity was increased 26-fold during the same period [3]. Therefore, the global energy demand of the data centers grew from 194 TWh to 205 TWh during 2010 − 2018 [3]. Additionally, the data centers will indirectly affect the CO 2 emission because of the growing energy demands, which has been projected up to 720 million tons by 2030 in [4]. At present, the leading companies in the Information and Communication Technology (ICT) business are now building their new data centers in the high latitude areas in the Arctic region to avail the natural advantages including the renewable energy production facilities, the cold air and the appropriate humidity. Google has built a data center in Hamnia, Finland in 2011 to use the cold sea-water from the Bay of Finland and the onshore wind energy; while Facebook has moved to Sweden in 2013 and Ireland in 2016 for having natural advantages in the data center operation [4]. These companies are utilizing the natural advantages to reduce the energy consumption of the data centers, hence indirectly reducing their participation in the CO 2 emission. There are two major phases of data center innovation to cope with the challenges of energy efficiency. In the first phase, the data center operators have emphasized on improvement of efficiency of the Information Technology (IT) equipment and the data center cooling facilities during 2007 − 2014 [3]. During this time, the Nordic region has attracted significant investments for data centers for environmental benefits. For example, after Google and Facebook entered the region in 2009 and 2011, the Nordic countries have become a preferred site location by an increasing number of data center investors. A report by Business-Sweden estimates that the Nordics by 2025 could attract investments for data centers in the order of 2 − 4 billion Euro. This is based on the forecast of worldwide demand for data center services corresponding to the data center investments of the Nordic countries [5]. In the second phase, the large data center operators have focused on procuring renewable energy (i.e., wind, solar) to supply power for the data center operations instead of traditional power sources [3]. The data centers are opening new business opportunities while posing the following operational challenges: • Increasing the energy efficiency of data centers to limit the energy consumption and CO 2 emission, hence reducing the operational cost of the data centers.
• Enhancing the service availability of the IT section, hence enhancing the overall reliability of the data center to satisfy the Service Level Agreements (SLA) with the clients of the data center.
• Making a strategical balance at the design stage to reduce the energy consumption and ensuring higher reliability of the data center. The energy consumption and reliability models of the data center are needed to bring solutions for these two operational challenges in data centers. The energy consumption models could help to predict the consequences of the operational decisions, which results in more effective management and control over the system [6]. Furthermore, reliability modeling of the data center individual load sections and the reliability assessment of the data center as a whole are important to prevent unwanted interruptions in the services and to ensure the committed SLA [7]. In some cases, the reliability assessment model also demands the energy consumption models of the devices in load sections. As examples, the power losses of the Internal Power Conditioning System (IPCS) is taken into consideration to assess the overall service availability of the IT loads in [8], while the energy consumption models of the cooling section devices are used for cooling section's reliability assessment in [9]. In this regard, a suitable energy consumption model or modeling approach does not solely mean accuracy and precision of the model, while the energy consumption modeling approach of the data center often depends on the applications of the components' energy or power consumption models.
The purpose of this paper is to provide a review of the data center energy consumption modeling approaches and reliability modeling aspects that have been presented in the literature.

B. RESEARCH GAPS
The authors have reviewed 193 papers that are related to the data center energy consumption and the reliability modeling aspects. There exists a lack of review works in the literature regarding the data center reliability modeling, which is needed for further research to show the state-of-the-art of data center reliability analysis. In this paper, the authors have tried to fill the research gap by analyzing the reliability modeling aspects to show the current knowledge-base about data center reliability affecting factors, reliability indices, and reliability assessment methodologies.
Besides this, the energy and power consumption models of the data center loads are analyzed, and the advantages and disadvantages of the models to apply in research are explained, which are widely missing from the literature. Additionally, the power consumed by the devices in IPCS is also considered and analyzed as a data center load section like the IT and cooling load section, which is missing in previous review articles. As the trade-off between reducing the energy consumption and ensuring higher reliability of the data center is an operational challenge, which is not addressed properly in the literature. This paper gives recommendations to fill the research gaps by the future researchers for making a trade-off between the reliability and energy efficiency of data center.

C. OBJECTIVE AND APPROACH
The research interest in the energy-efficient and reliable operation of data centers has increased in last few decades, as shown in Figure 1. However, the number of the published articles on data center reliability is lower than the number of articles on data center energy efficiency, which shows an urge to review the state-of-the-art of the data center reliability analysis. Moreover, the number of published articles on data center reliability analysis has reduced since 2016, as shown in Figure 1. Due to the lack of research in the data center reliability modeling the integration of new data center technologies could be impacted. The adaptation of the new technologies and equipment in the existing system of a data center depends on the reliability of the new technologies and equipment, which demands further research on it [10]. Apart from the reliability analysis, the energy efficiency analysis is also important for integrating the new technologies in data centers since most of the new technologies are coming with additional environmental challenges for the cooling load section [10]. Especially in the context of Green data center, which means the energy-efficient operation of the IT and the cooling load [10], the research on the data center reliability and energy consumption modeling should be emphasized. Therefore, the objective of this article is set to review the energy consumption models of the components in major load sections of the data center, and the data center reliability modeling including reliability indices, methodologies, and factors that affect the reliability of the data center. This review article could provide a potential starting point for further research on these topics. databases like Google Scholar (http://scholar.google.com), Web of Science (https://apps.webofknowledge.com/), IEE-Explore (http://ieeexplore.ieee.org), ACM Digital Library (http://dl.acm.org), Citeseer (http://citeseerx.ist.psu.edu), ScienceDirect (http://www.sciencedirect.com), SpringerLink (http://link.springer.com), and SCOPUS (http://www.scopus. com). Based on the searched results the research trends on the mentioned topics are shown in Figure 1, however, all the researched articles are not reviewed in this paper since the aim of this paper is to review the energy consumption models of the major components of the data center and the reliability modeling aspects of the data center. Therefore, the following keywords are used to filter the articles: • Service availability and service reliability • Reliability modeling approach The Systematic Literature Review (SLR) based methodological approach [11] is adopted in this paper with the above mentioned keywords. All the relevant attributes of the selected papers are used for constructing the knowledge base that is presented in this paper.

D. CONTRIBUTIONS & RECOMMENDATIONS
The contributions and the recommendations based on the review of energy consumption modeling of the data center are as follows: • This paper has classified and summarized the published review articles based on their contributions for energy consumption modeling of the data center, while the absence of review articles on data center reliability assessment is also identified. Therefore, the data center reliability modeling aspects are comprehensively reviewed in this paper.
• The power and energy consumption models of the components and equipment in the major load sections of the data center are reviewed in this paper. The proposed consumption models of the servers in IT load section are classified into four groups depending on the mathematical formulation of the models in the literature. The advantages, disadvantages, and applications of the server's power consumption models are also presented in this paper.
• The energy consumption models of the data center load sections are often used for analyzing the data center reliability, along with the aforementioned applications of the consumption models. The trade-off between the energy efficiency and the reliability of the data center is not addressed in research adequately that is found in the analysis of this paper.
-Based on this analysis the recommendation for the future research on data center energy modeling would be choosing suitable energy consumption models of the equipment depending on the application. The accuracy of the models is often prioritize in research, however, it is found that the availability of the model parameters and variables are more important than the accuracy for research application. The energy consumption model parameters and variables that are easily accessible or measurable in laboratory facilities offer simplicity and ease in research applications. -More research should be conducted towards power losses and energy efficiency of the IPCS of the data center. There are research articles that present the load modeling for IT and cooling load section; this is not the case for the IPCS section, while the consumption of the IPCS section is found to be more than 10% of the total consumption.
The contributions and the recommendations based on the analysis of the data center reliability modeling and assessment techniques are as follows: • This paper reviews the reliability modeling aspects related to the data center. The reliability indices and metrics for IT, IPCS, and the cooling load sections are analyzed including the reliability modeling methodologies. The reliability modeling methodologies are classified into two groups (i.e., analytical and simulation-based) depending on the modeling approaches. This research identifies the state-of-the-art of data center reliability modeling techniques that are studied so far, which could be a starting tool for future researchers.
• The need to have a standard code for data center operation along with the tier classification is identified in this paper since the failure and the degraded mode of a data center can impact the reliability differently.
-The recommendation is to focus on the data center reliability study considering new equipment and topologies with new technologies. The new technologies are putting more stress on the load sections as explained in [10]. The lack of research on data center reliability aspects could hamper the development growth of the individual load section and also the development of the data center industry. -The lack of research on the data center's cooling load reliability is addressed; thus it is recommended to give more research focus on the cooling section reliability assessment. -The availability of data center component's statistical failure data is important for reliability studies at different levels of data centers. Thus, it is recommended to the data center owner/operators to publish the statistical failure data of data center components to ensure the adequacy of resources for further research.

E. ORGANIZATION
The paper is structured as follows: The contributions and remarks of the published related review papers are explained in Section II. Section III analyzes the energy consumption models of the data center's major load sections. Section IV discusses the reliability modeling aspects of the data center. The limitations and the future works are explained in Section V. Finally, Section VI concludes the article with recommendations and discussions based on the analysis.

II. RELATED REVIEW ARTICLES
According to the Web of Science at least 56 review articles have been published between 2005 − 2020, where the articles have presented the overview of the data center load section's energy and power consumption models and the application of the models in the data center. The articles are searched with the keywords ''review OR overview OR survey data center energy consumption model'' in the database. The published  articles are classified into two categories: 1) the energy efficiency techniques at component-level to data center systemlevel, and 2) the energy management techniques. The energy management techniques also include the thermal environment design and management, air flow control including free cooling, thermal metrics, and thermal parameter optimization. Moreover, the researchers' interest in these articles is also increasing, which is depicted by the increasing number of citations of the articles. The review articles are analyzed based on the subjects of review and the number of citations, as shown in Table 1 and Table 2. The review articles addressing ''data center reliability or availability modeling'' were not found. However, the research interests on data center reliability modeling have been observed by the increasing number of published articles that address various aspects of the data center reliability, as shown in Figure 1b, which also quests about a SLR considering the data center reliability modeling for future researchers.
A taxonomy based on the overview of the energy consumption and reliability modeling of the data center is shown in Figure 2.

III. REVIEW OF DATA CENTER LOAD MODELING
Data center accommodates ICT equipment, which provides data storage, data processing, and data transport services [51]. Data centers typically have three major load sections: IT loads, cooling and environmental control equipment, and internal power conditioning system i.e., Uninturrupted Power Supply (UPS), Power Distribution Unit (PDU), and Power Supply Unit (PSU), including security and office supports, as shown in Figure 3. The IT load section contains servers, storage, local cooling fans, network switches, etc. The data center also needs a power conditioning system with cooling and environmental control to maintain the adequate power quality and the required temperature for the IT loads [28], [52], [53]. The IT load section of the data center is needed to be environmentally controlled since it houses devices like servers and network switches that generate a considerable amount of heat. The IT devices are highly sensitive to temperature and humidity fluctuations, so a data center must keep restricted environmental conditions for assuring the reliable operation of its equipment [25]. Besides the IT and cooling load sections, the power conditioning section is another important part of the data center that also consumes power [8], [52]. The amount of power consumed by the load sections depends on the design of the data center and the efficiency of the equipment. The largest power consuming section in a typical data center is the IT load section including IT equipment (45%) and the network equipment (5%), while the cooling loads (38%) rank second in the power consumption hierarchy, as shown in Figure 4 [24], [52]. Besides these two load sections, the power conditioning devices in the IPCS consume 8% of the total power of the data center, which has not been studied deeply in the existing literature. However, consideration of every possible VOLUME 9, 2021 power consumption is needed to properly model the power consumption of the entire data center because a model is a formal abstraction of a real system [54]. Regarding the power consumption of the load sections in a data center, the models can be represented as equations, graphical models, rules, decision trees, sets of representative examples, neural networks, etc [28]. The following are the main applications of the power consumption models for the data center.
• Design of the power supply system of a data center: The power consumption models of the load sections are necessary for the initial design stage of the IPCS of energy intensive industries like the data centers. It is not worth building a IPCS without prior knowledge of energy demand load sections and the power losses of the system [55]. A simulation tool is proposed in [56] that evaluates the Power Usage Efficiency (PUE) and other energy usage efficiency factors of data centers, which is applied in the Data Center Efficiency Building Blocks project to optimize the energy consumption of the data center considering the maximum loads in the data center, as explained in [56]. The power consumption models of the load sections could be a useful tool to design the internal power supply infrastructure of the data center.
• Forecasting the energy consumption trends and enhancing the energy efficiency of data centers: Understanding the power consumption trend of the data center load sections is important for maximizing the energy efficiency. In data center operation, the real-time power measurements cannot help to take decisions and provide the solutions, thus the predicted power consumption of the load sections is needed alongside [57]. The power consumption models of the data center components are used to predict the power consumption of the load sections in [58]. The forecasted power consumption trends of the load section helps in data center operation to optimize the overall consumption of the data center [59].
• Power consumption optimization: Different power consumption optimization models have been applied in data center using the power consumption models of the data center load sections to ensure the energy efficiency and cost effective data center operation. In [10], [60] the power consumption models of the load sections are used for optimizing the power consumption of the data center.  Modeling the exact power consumption behavior of a data center, either at the system level or at the individual component level, is not straightforward. The power consumption of a data center depends on multiple factors like the hardware specifications and internal infrastructure, computational workloads, type of applications of the data center, the cooling requirements, etc., which cannot be measured easily [10], [33]. Furthermore, the power consumption of the hardware in the IT load section, the cooling section, and the power conditioning infrastructure of the data center are all closely coupled [61]. The development of the component level power consumption models helps in different activities such as new equipment procurement, system capacity planning, resource expansion, etc. The power consumption models of different load sections are described in the following part of this section.

A. IT LOAD MODELS
Some of the discussed components in IT load section may appear at different other levels of the data center hierarchy, however, all of them are specifically attributed to IT loads of a data center. Traditionally the servers are the main computational resource in the IT load section. Other devices like memory, storage, network devices, local cooling fans, and server power supplies are also considered as IT load in the literature. The most power consuming components in the IT load section are the servers [39]- [41]. The percentage of power consumption by the components of the servers is shown in Figure 5, [23], [24]. The Central Processing Unit (CPU) is the largest contributor to the total server power consumption, followed by peripheral slots VOLUME 9, 2021 (including network card slot and Input and Output devices (I/O) devices), conduction losses, memory, motherboard, disk/storage, and cooling fan. Therefore, the energy usage or the power consumption models of the server that has been presented in the literature are emphasized in this paper.
Server Consumption Model: The proposed power consumption models of the servers are classified into four groups based on the characteristics of the proposed power and energy consumption models, namely additive model, baseline-active model, regression model, and utilization-based model.

1) ADDITIVE POWER MODELS
The power consumption models of the server that are proposed as a summation of the server components' power consumption belong to this group, as summarized in Table 3. The most simple server power consumption model was proposed considering the power consumption of the CPU and memory unit in [62]. Later, other additive models are proposed considering additional components in the equation of the server power or energy consumption model, as shown in Table 3. Most of the proposed models tried to mimic the power consumption of the main-board or motherboard as the power consumption of the servers like in [62]- [64], while the consumption of motherboard is addressed separately in [65]. The power consumption of the motherboards can be considered as the conduction loss of the server, as shown in Figure 5. A further extension of the additive server power consumption model is presented in [66], where the overall power consumption of the server is proposed with a base level consumption, P base , as shown in (1). P base accounts for the un-addressable power losses including the idle power consumption of the server. P server = P base + P CPU + P disk + P net + P mem (1) The power modeling approach as shown in (1), can be further expanded considering the fact that the energy consumption can be calculated by multiplying the average power by the execution time [64]: A different version of (1) is obtained by considering levels of resource utilization by the key components of a server [67]: A similar energy consumption model of the entire server is described by Lewis et al. in [68]: (4) where, A 0 , A 1 , A 2 , and A 3 are constants that are obtained via linear regression analysis and remain the same for a specific server architecture. The terms E proc , E mem , E em , E board , and E hdd represent the total energy consumed by the processor, energy consumed by the DDR and SDRAM chips, energy consumed by the electromechanical components in the server blade, energy consumed by the peripherals that support the operation onboard, and energy consumed by the hard disk drive. The close relation between CPU and memory energy consumption is attributed by assigning the same constant A 0 for both CPU and memory.

2) BASELINE -ACTIVE (BA) POWER MODEL
In data centers, the servers do not always remain in the active state, as servers can be also switched to the idle mode. Therefore, the power consumption of the server can be divided into two parts, i.e., (1) Baseline power (P base ), and (2) Active power (P active ). The idle power consumption of the server also includes the power consumption of the fans, CPU, memory, I/O, and other motherboard components in their idle state, denoted by P base . It is often considered as a fixed value [73], [74]. P active is the power consumption of the server depending on the computational workloads, hence, on the server resource utilization (i.e., CPU, memory, I/O, etc). Therefore, the power consumption model can be expressed as the sum of the baseline power and active power, as given in (5). Similar server power consumption models related to the Base Active (BA) modeling approach are presented in Table 4.
where, P is the correction factor of the server power consumption model, which can be either a fixed value or an expression. The active state power consumption of the server, P active of the BA models can be expressed as a function of server utilization, coolant pump power consumption, Virtual Machine (VM) utilization factor, etc., as depicted in Table 4.

3) REGRESSION MODELS
The regression model of the server power consumption considers the correlation between the power consumption and performance counters of the functional units of the servers i.e., CPU, memory, storage, etc. The regression models capture the fixed or idle power consumption and the dynamic  power consumption with changing activity across the functional units of the servers. Therefore, the regression based server power consumption models are also known as 'Power Law models', which has become popular in data center application during 2010 − 2014. The regression models are mostly adopted in research because of the simplicity and interpretability of the models, however, these models are not suitable to track the server power consumption in cloud interfaces since the server workloads fluctuate frequently [77]. The accuracy of the regression models are analyzed in [78], where it is mentioned that the regression models can predict the dynamic power usage well with the error below 5%. However, the error can be around 1% for non-linear models depending on the usage case [23]. In this paper, the regression models of the servers are classified into three groups (i.e., simple regression model, multi regression model, and non-linear model).

• Simple regression model
The correlation between the power consumption and the performance counters that captured the activity of the CPU was first proposed in [79], while the mathematical model was first presented in [78], as given in (6). Additionally, the power consumption model presented in [78] was also validated by the experimental results.
A similar model for cloud based system with VMs is presented in [80], which has the scope to use different independent variables for different application scenarios: Similar simple regression models presented in different articles are summarized in Table 5.
The simple regression models that are shown in (5)-(6) are based on CPU utilization, but addressed VOLUME 9, 2021 from different points of view. These power consumption models can provide reasonable accuracy for CPU-intensive workloads, however, cannot show the change in power consumption of servers caused by I/O and memory-intensive applications [73].
The server power consumption model as a function of utilization of the CPU, memory, disk, and network devices is presented in [84], as shown in (8). It assumes that subsystems such as CPU, disk, and I/O ports show a linear power consumption concerning their individual utilization, as discussed in [85]. [86] to achieve a more accurate power prediction, as shown in (9). It is noteworthy that n VM in (9) is the number of VMs running on a server, which is assumed to homogeneous in configuration thus the weights α, β, γ and e for each VM are the same. The proposed model considers the components of the server to be connected as building blocks of the server, which is valid for blade servers. It also assumes that subsystems show a linear power consumption concerning their individual utilization, as shown in (9). In this contrast, Kansal et al. proposed further detailed model of server power consumption in [87], considering CPU utilization, the number of missing Last Level Cache (LLC), and the number of bytes read and written, as shown in (10). These two consumption models are basically the same, except the additional term N LLCM , as it is depicted by the comparison of (9) and (10). The term N LLCM represents the number of the missing LLC during T , and α mem and γ mem are the linear model parameters. A more generalized power consumption model is presented in [88] based on the server's components performance counters (i.e., CPU cycles per second, references to the cache per second, cache misses per second), as given in (11). Later, the power consumption model in (11) is further extended by Witkowski et al. [89] by including the CPU temperature in the model.
where the power consumption of a server by a combination of variables Y i , i = 1, . . . , I , and X jl , j = 1, . . . , J describing individual processes l, l = 1, . . . , L. The power consumption of a server with no load is denoted by P 0 (the intercept), and the respective coefficients of the regression model are α i and β j . The ambient temperature, CPU die temperature, memory and hard disk consumption, including the energy consumed by the electro-mechanical components are added to the regression model in [90], as shown in (12).
These models can predict the energy consumption precisely as long as the trend of workload does not change.
• Non-Linear Models A non-linear model is proposed in [78] that includes a calibration parameter r, which minimizes the square error, as shown in (13). The square error needs to be obtained experimentally since it depends on the type of the server. This same model is also presented in [83], [91] where r is a calibration parameter that minimizes the square error which needs to be obtained experimentally. The power model in (13) performs better than the regression models to project the power consumption of the servers [78], however, it needs to determine the calibration parameter r which is a disadvantage associated with the model. Meanwhile, Zhang et al. in [58] has used high-degree polynomial models to fit the server power consumption, finding that the cubic polynomial model as in (14c) is the best choice compared to (14a) and (14b). Similarly, the relationship between power consumption and the second order polynomial of server utilization is provided in [92].
where R is the resource utilization, a, b, c, and d are the constants of the polynomial fit.

4) UTILIZATION-BASED POWER MODEL
Most of the system utilization-based power models leverage CPU utilization as their metric of choice in modeling the entire server's power consumption since CPU is the most power consuming component in the server, as shown in Figure 5. One of the earliest CPU utilization-based server power models has appeared in [93], as shown in (15), which is an extension of the basic digital circuit power model, given in (16). The P dyn is the dynamic power consumption of any circuit caused by capacitor switching, where A denotes the switching activity (i.e., Number of switches per clock cycle), C as the physical capacitance, V as the supply voltage, and f as the clock frequency. Different techniques can be applied for scaling the supply voltage and frequency in a larger range, as shown in (15).
It is important mentioning the voltage is proportional to the frequency f as V = (constant) × f [93]. The constant c 0 includes the power consumption of all components except for the idle power consumption of the CPU in (15). The term c 1 = AC( V f ) 2 are obtained from (16) where A and C is the switching activity (i.e., number of switches per clock cycle) and the physical capacitance, respectively.
Further in 2007, another notable CPU utilization-based power model is presented in [78] which has influenced recent data center power consumption modeling research significantly, as given in (17). This power consumption model of the server can track the dynamic power usage with a greater accuracy at the PDU level [94], [95]. This power consumption model of the server also fits into the catalog of simple regression models because of the mathematical formulation, as shown in (6).
This model assumes that the server power consumption and the CPU utilization have a linear relationship. Studies have used this empirical model as the representation of the server's total power consumption in [58], [96]. However, certain researches define a different utilization metric for the power consumption model of the server. The power model defines the utilization as the percentage between the actual number of connections made to a server against the maximum number of connections allowed on the server in [81], which is used for a specific use-case to model the power consumption of a content delivery network server. A non-linear server power model based on CPU utilization is proposed in [83], [91], as shown before in (13).
Importance of Server Power Consumption Model: According to [97], saving 1 W of power at the CPU level could turn into 1.5 W of savings at the server level, and up to 3 W at the overall system level of the data center. The overall power consumption of the IT equipment can be reduced by reducing the power consumption of a single device or distributing the workload to the server clusters [98]- [100]. Thus, the power consumption model of the server is important to ensure the cost-effective operation of the data center. Regarding the applications of power consumption models, accuracy and simplicity are the main requirements, but they are contradictory and restricted [23]. As an example, a simple regression power consumption model of the serves is used to obtain the power consumption of the IT load, which is used further to assess the reliability and the voltage dips impacts in the IPCS [8], [101]. Meanwhile, the higher order regression models (i.e., quadratic and polynomial) of the server power consumption models are more complicated compared to the linear models. The complicated higher order regression model of server power consumption is used in [58] to improve the power efficiency of the servers by scheduling the task in a cloud interface, where the authors have focused on the accuracy of the model except the simplicity. Thus, the tradeoff between accuracy and the simplicity of the consumption models of the servers depends on the application. The applications of the analyzed modeling approaches with advantages and disadvantages are summarized in Table 6.

B. INTERNAL POWER CONDITIONING SYSTEM MODEL
The IPCS of a data center consists of UPS, PDU, and PSU including the protection and power flow control devices (circuit breakers, automatic transfer switch, by-pass switch. etc.). The IPCS ensures the voltage quality and reliability of the power supply to the IT load section that guarantees the desired QoS [8], [18]. The IPCS of a data center consumes a significant amount of power during the voltage transformation process which is treated as power losses in [8], [52]. In a typical data center power hierarchy, a primary switchboard distributes power among multiple online UPSs. Each UPS supplies power to a collection of PDUs. A PDU feeds the IT load demand of the servers in a rack through PSUs located in the racks. A rack contains several chassis that host individual servers. The general representation of the IPCS is shown in Figure 6, which is explained in [8], [111], [112].
The PDU transforms the supplied high AC voltage to low AC voltage levels to distribute the power among the racks through the connected PSUs. The PDUs get the power from the UPS, while the UPSs are typically connected to the utility supply and backup generators, as shown in Figure 6. Depending on the region, the data center supplied voltage can vary from 480 V AC to 400 V AC that needs to be step down before distributing among racks [114]. The PDU works as a FIGURE 6. An example of the internal power conditioning system. [113].
power converter to maintain the adequate voltage quality of the rack supply and the PSUs at racks rectify the supplied voltage for the servers using Switch Mode Power Supply Unit (SMPSU) [8]. The power electronic devices with high frequency switching like PDUs, incur a constant power loss as well as a power loss proportional to the square of the server load [52], as shown in (18). The PDU typically consumes 3% of its input power [52], [115]. As in current practice, all the PDUs remain connected with the supply system, which increases the idle loss of PDU [115]. The power loss coefficient of the PDU is represented by φ PDU in (18) as explained in [28], [115].
The UPSs provide backup support during power supply interruptions up to some tens of minutes, voltage dips, and other disturbances originating upstream the UPS. Different types of UPS have been studied to evaluate the efficiency and performance for specific uses, while the Online UPSs are claimed to be the most-reliable choice for data center application because of the fast response time [114], [115]. Advancement has made recently to the internal topology of the online type UPS to improve the power quality [116], [117], efficiency [118], and performance [119], [120]. However, research on the power consumption or loss modeling of the UPS for data center application is very limited. The power consumption model of the UPS depending on the supplied IT load is proposed in [115] and later also used in [28], [52], [121]. The power consumed by the UPSs depends on the supplied power regardless of the topologies as shown in (19) The power consumption of the PSU depends on its supplied power to the server [52], [53], [111]. The efficiencies of the PDU and the PSU are compared at different voltage levels of the data center in [114]. The efficiency of the PSU (87.56%), is less than the efficiency of the PDU (94.03%) for a 480 V AC system in data center [114]. The efficiency is calculated based on the input and output power of each unit in [114]. However, the total power consumption of all PDUs is higher than the total power consumption of all PSUs [113] in the IPCS, because of the ideal power loss and the non-linear relation of the PDU's loading and power loss as shown in (18).
A comparative study is shown in [52], where the performance of these devices in the IPCS has been evaluated in terms of consumed power by the IT loads in the data center. The PDUs are claimed to be the most power consuming equipment in the IPCS compared to UPS and PSU in [52], which can even lead to outages as explained in [8]. Due to the series power loss component in PDU that is represented by the square term in (18), the total power loss of the PDUs goes higher than the total power loss of the UPSs and PSUs [8], [52]. However, the efficiency of the PDU is compared with the UPS that shows the efficiency of PDU is higher than UPS [114], [115]. The power consumption of the devices in the IPCS in terms of percentage of the served IT loads for a hyperscale data center is shown in Figure 7. The analysis has been done based on the information that is presented in [112], [121] about the idle power consumption and the power loss coefficients of the UPS and PDU. The data center is considered with 10, 000 servers with a rated power of 1 kW. A similar IPCS configuration is considered as shown in Figure 6, where each rack with 10 servers needs a PDU to distribute the power between the connected PSUs at the rack. Therefore, the data center is simulated with 10000 servers in 1000 racks that need 10 MW power for the servers. The racks are assumed to be supplied by 10 identical units of the UPS. The devices in the IPCS has consumed 1, 301 kW of power to server 10 MW of the IT loads, which is around 13% of the power consumed by the IT loads, as depicted in Figure 7a. The power consumed by the devices in the IPCS is considered as the power loss in the IPCS [113]. In the assessed data center, the PDUs consume 7.3% of the power consumed by the IT loads while the UPS consumes 4.7% assuming the full computational loads for the servers, as shown in Figure 7b. The power loss of the PSU is assumed to be 1% of the supplied power to the servers since the power loss of the PSU is load dependent [113]. This analysis also shows the total power loss of the PDUs are more than the power loss of the UPSs as claimed in [8], [52].

C. COOLING SECTION MODELS
The cooling and environmental control system is used to maintain the temperature and humidity of the data center. This sections mainly contains the Computer Room Air Cooling System (CRAC) unit, cooling tower, humidifiers, pumps, etc. to ensure the reliable coolant flow in the data hall. The highlydense IT loads generate enormous amount of heat in data center, which is handled by the cooling load sections. The cooling loads ensure the environmental control and the IPCS ensures the power quality of the supply to the IT loads; while both of these load sections are needed to ensure uninterrupted service of the IT loads in the data center. The cooling load section of the data center is the biggest consumer of power among the non-IT load sections followed by power conditioning system losses in a typical data center, as shown in Figure 4. The energy consumption models of the cooling section have various applications in data centers operation i.e., cooling section energy consumption management, optimization, generated heat utilization, thermal control, etc. The power consumption models of the cooling load section's components are essentially needed for the mentioned methods.
The power consumption of the cooling load section depends on multiple factors like the layout of the data center, the spatial allocation of the computing power, the airflow rate, and the efficiency of the CRAC [112], [122]- [124]. There are two major working components in the cooling section. (1) the CRAC unit, and (2) the chiller or a cooling tower.

1) CRAC UNIT MODELS
The CRAC has recently drawn attention regarding efficiently handling the coolant flow in the data centers [125]- [127]. The heat generated from the servers, hence the IT loads in the data center are removed by the CRAC units installed in the server room. The cooling power that is consumed by the CRAC units is proposed in [125] as a function of supplied coolant temperature t s and coefficient of performance C CoP , as shown in (20). The authors use the HP CRAC model in [125] with a C CoP = 0.0068 · t 2 s + 0.0008 · t s + 0.458, where t s is the maximum temperature of the supplied coolant from the CRAC. The maximum efficiency of the CRAC unit can achieve by finding the maximum value of t s that guarantees the reliable operation of the servers [125].
Another power consumption mode of the CRAC system is presented in [128], where the CRAC is assumed to be equipped with variable frequency drivers (VFDs) which showed the following empirical relationship between individual CRAC unit power consumption P crac i and relative fan speed θ i for the CRAC unit, The impact of racks arrangement, ambient temperature, outside temperature, and humidity on the power consumption of the CRAC unit is analysed in [129]. As explained in [129] the power consumption of CRAC is proportional to the volume of the airflow, f , and also depends on the heat generated by the servers, as shown in (21), (22). The required volume of air flow in a server room can be determined by f = f max × U, where f max is the maximum standard air flow (14000 m 3 /hr for a 7.5 kW CRAC unit). The power required to transfer the heat P heat from the server room is shown in (22), where the idle power of the CRAC unit, P idle CRAC can be considered as 7% to 10% of P max sf .
Recently a power consumption model of the CRAC is presented in [52] dominated by fan power, which grows with the cube of mass flow rate to some maximum (P CRAC Dyn ), together with a constant power consumption for sensors and control systems (P CRAC Idle ), shown in (23). Some CRAC units are cooled by air rather than chilled water or contain other features such as humidification systems, which are not considered here.
On the contrary, the power consumption of the CRAC in data centers is addressed in terms of thermal management in [126], [130], [131], where the authors relate the power consumption of the CRAC to the temperature of the data hall and the heat generated from the IT load section to optimize the power consumption of the cooling load section.

2) CHILLER AND COOLING TOWER POWER CONSUMPTION MODEL
There is not so much that has been done to address the chiller power consumption for the specific use case of data centers. The chiller plant removes heat from the warm coolant that returns from the server room. This heat is transferred to external cooling towers using a compressor. The chiller plant's compressor accounts for the majority of the overall cooling power consumption in most data centers [128]. The power drawn by the chiller depends on the amount of extracted heat, the chilled water temperature, the water flow rate, the outside temperature, and the outside humidity. According to [18], the chiller's power consumption increases quadratically with the amount of heat to be removed and thus with the data center utilization. The size of the chiller plant depends on the maximum heat generated from the IT load section. According to the design practice the chiller should handle at least 70% of P max sf in order to provide sufficient cooling [132]. The chiller plant power consumption model is shown in (24). Another chiller power consumption model is given in [128], which depends on the power consumption of the refrigeration system P r , as shown in (25). The constants α, β and γ are obtained by performing a curve fitting of several samples from the real data center.
where η and U are the efficiency of the chiller system and the average utilization of the servers in the IT load section.

3) POWER CONSUMPTION MODEL OF THE COOLING SECTION
The additive power models are common for modeling the data center's cooling section power consumption like IT load section. An additive model for power consumption of the cooling system of the data center is presented in [133] and shown in (26). The power consumption model includes the CRAC fan, refrigeration by chiller units, pumps of the cooling distribution unit, lights, humidity control, and other miscellaneous items [133]. P rf corresponds to the total power consumption of the cooling system for a raised floor architecture, known as a refrigeration system. P CRAC is the power consumed by computer room air conditioning units. P cdu denotes the power dissipation for the pumps in the cooling distribution unit (CDU) which provides direct cooled water for rear-door and side-door heat exchangers mounted on the racks. P misc is the power consumed by the miscellaneous loads in the cooling system. This model is almost equal to the model of raised floor cooling system power consumption described in [128].
The total power consumption of the CRACs and total power consumption of the CDUs could be expressed as follows, where i and j corresponds to the number of CRAC and CDU units, respectively. P CRAC and P cdu are the total power consumption of the CRAC and CDU units.
A summary of the data center load modeling analysis with the references is given in Table 7.

IV. REVIEW OF THE RELIABILITY MODELING OF DATA CENTERS
Data centers should be environmentally controlled and equipped with power conditioning devices to ensure the reliable operation of the IT loads including servers and network devices. Data center operators take every possible measure to prevent deliberate or accidental damage to the equipment in the data center so that the load sections could ensure a high degree of reliability in operation. By definition, reliability is the probability of a device or system performing its function adequately under specific operating conditions for an intended period of time [134], [135]. Here the degree of trust is placed in success based on past experience, which is quantified as the probability of success for a mission oriented system like a data center in this case. This reliability definition considers only the operational state of the component or system without any interruptions. Meanwhile, the probability of finding the component or system in the operating state is known as ''availability'', which is used as a reliability index for a repairable system [135]. In this case, the components in the data center load sections are repairable that also includes the replacement process, therefore the availability index is widely used in data center reliability modeling [136].
The data center industry has come to rely on ''tier classifications'' introduced by the Uptime Institute as a gradient scale based on data center configurations and requirements, from the least (Tier 1) to the most reliable (Tier 4) [136]. The Uptime Institute defines these four tiers of data centers that characterize the risk of service impact (i.e., unavailability and downtime) due to both service management activities and unplanned failures [137]. VOLUME 9, 2021

A. TIER CLASSIFICATION OF DATA CENTERS
The core objective of the tier classification of data centers is to make a guideline of the design topology that will deliver desired levels of availability as dictated by the owner's business case, which is introduced by the Uptime Institute [136], [137]. The tier of the data center is determined by the availability of the IPCS including the utility and backup generator supply [136], [137]. The Uptime Institute is the pioneer in researches to standardize the data center design and describe the redundancy of its underlying power supply systems. According to The Uptime Institute's classification system, the internal infrastructure of data centers has evolved through at least four distinct stages in the last 40 years, which is used for the reliability modeling and known as ''Tiers of Data center'' [136]- [138]. As of April 2013, the Uptime Institute had awarded 236 certifications for building data centers around the world based on the tier classification [139]. This is a combination of quantitative and qualitative classification approach, as depicted in Figure 8. The combination of these two approaches is used by the Uptime Institute for tier certification, however, the reliability assessment approach depends on the data center owner's business cases, which is discussed further in Section IV-D. The tier classification system evaluates data centers by their capability to allow maintenance and to withstand a failure in the power supply system. Tier I (the least reliable) to Tier IV (the most reliable) are defined depending on the redundant components in the parallel power supply path to the critical load sections. However, the deterministic approach used in [136], [139] to calculate the availability for different tiers has ignored the outage probability of the grid supply, different failure rates of the IPCS components, and random failure modes in the power supply paths. The specification and redundant options from [136]- [138] are summarized in Table 8.
The availability of the data center for different tiers that are given in Table 8 are criticized in [140]. The availability of the data centers that are shown in [140] are less than the former ones. Due to considering more detailed failure possibilities in the data center internal infrastructure the risk of failure increases, hence the availability decreases for the studied system in [140]. Therefore, the redundancy in the power supply path can not only improve the availability of the data center; the availability could degrade due to common mode failures, which demands statistical data for further research. Although a crude framework and design philosophy that is provided in [136]- [138] is still useful, the results are presented based on some assumptions, as follows: • The fault tolerance of the tiers does not solely depend on the redundancy of the power supply path because there is a possibility to have common mode failures. The impact of the common mode failures in rack-level PSUs on the availability of the servers are presented in [113].
• The studies only consider the single point of failures in specific critical output distribution points like PDUs and provide a solution to use dual corded PDUs in Tier IV data center. However, it is argued in [8] that the dual corded PDUs also could fail to supply the required power to the servers because of power supply capacity shortage.
• The articles have followed a deterministic approach with constant failure and repair rates of the components to assess the availability of small IPCSs, while the IPCSs in real data centers are large and complex with a high number of uncertainties to have component outages at different levels in the IPCS.

B. FACTORS TO CONSIDER FOR THE RELIABILITY MODELING IN DATA CENTERS
The most important factor for assessing the reliability of the data center is the failure of the components in the system. Arno et al. has formulated an example in [136] as follows: ''If the UPS in the power supply system fails and all the connected loads for the data center lose power, that would obviously be a ''failure. '' But what about one 20 A circuit breaker trips and one rack of equipment losing power? Is that a ''failure'' for the data center?'' [136] According to the definition of failure given in Chapter 8 of the IEEE Gold Book, Standard 493-2007 [141], ''the failure is the loss of power to a power distribution unit (or UPS distribution panel in case of the data center).'' Thus the loss of an entire UPS would impact the overall mission of the connected facility that is a failure of the data center by definition. However, if a circuit breaker trips and the connected racks lose power then it will not be considered as a failure of the data center, rather the servers at the racks are considered as failed or unavailable for operation. Therefore, the first step of any reliability analysis is to define the ''failure state'' of the studied system. A similar explanation is presented in [7] about ''error and failure'' for a cloud system, where the term ''failure'' is used for fatal faults in the system that are irreparable and catastrophically impact the system operation. However, ''errors'' degrades the system performance (i.e. latency, decreasing throw put) since the errors can be solved automatically and the system can recover to the initial state [7]. Additionally, the mentioned failure definition in the IEEE Gold Book, Standard 493-2007 [141] contradicts with the tier classification of data center, which shows failure with degraded performance mode is needed to be defined for data centers. The reliability analysis in [113] is an example in this regard. The failures of the rack-level PSUs are considered in [113] to assess the adequacy of the computational resources, hence the degraded performance of the data center.

C. RELIABILITY INDICES AND METRICS USED FOR RELIABILITY MODELING IN DATA CENTERS
In this section the reliability indices and metrics that are used in literature for data center reliability modeling are analyzed in three groups depending on the load sections. The applications of the reliability indices in reliability modeling differs for different load sections because the interpretation of the reliability assessment outcomes are not similar for the load sections, as mentioned in Section IV-B.

1) RELIABILITY INDICES FOR IT LOADS AND SERVICES
The indices that are used for IT load section could be classified into two groups, (1) the indices that are related to the IT performance and services, and (2) the indices related to the readiness of the IT load section in data center. The QoS is VOLUME 9, 2021 a key indicator to assess the performance of the data center, which also includes Key Quality Indicators (KQI) and Key Performance Indicators (KPI) for the IT services provided by the data center as explained in [7]. These indices are used for IT service monitoring and computational capacity management in data center. The ''service reliability'' and ''service availability'' indices are used to maintain SLA with the client or user of the data center. In other words, the service availability or reliability characterizes the readiness of a data center system to deliver the promised IT service to a user. The readiness of a system is commonly referred to as being ''up'' [142]. Mathematically, the service availability is estimated as given in (27). The ''service availability'' index is used in [8] for addressing the reliability of the IT loads or the servers at rack-level. The authors have also shown the server ''outage probability'' as a reliability index that varies with the increasing power losses in the IPCS in the data center. (27) where, A Service is the service availability. t up and t down are the uptime and downtime of the system, respectively. Apart from the probability of outages, the ''service reliability'' is also emphasized in [142] since the probability to fulfill the service requests without latency is characterized by this index. Importantly session-oriented services in data centers measure both the probability of successfully initiating a session with service, called ''accessibility'', and the probability that a session delivers service with promised QoS until the session terminates, called ''retainability'' [142]. In this regards the Defects Per Million Operation (DPM) is an index that measures the failed operation per million operations to assess the system reliability, as given in (28).
where, R DPM , O f , and O a are the defects per million operation, the number of failed operation, and attempted operation, respectively. The service reliability is represented by r Service . Another index named ''Service latency'' is mentioned with importance to assess the system reliability specially for edge and internet data centers in [143], [144]. Transaction latency directly impacts the quality of experience of end users; according to [142], 500 millisecond increases in service latency causes a 20% traffic reduction for Google.com, and a 100 millisecond increase in the service latency causes a 1% reduction in sales for Amazon.com. The service availability index is explained considering the average CPU load level, hence computational workloads, and CPU hazard function, with a new index ''load-dependent machine availability'' in [145]. Similar load-dependent reliability indices named ''average performance'' and ''average delivered availability'' are proposed in [146].
The basics of the QoS and the service reliability indices are similar; mostly based on the indicators of the service availability and the IT system performance. The IT system performance indicators are modeled in different ways, such an example is given in (28).
Apart from the mentioned QoS oriented reliability indices, there are other indices i.e., Mean Time Between Failure (MTBF), Mean Time To Repair (MTTR), availability, reliability that are used in reliability modeling for the physical components of the IT load section [147]- [149]. A similar reliability index called ''loss of workload probability (LOWP)'' is proposed based on the server outages probability at the rack-level in [113]. The risk of server outages due to electrical faults and the consequent voltage dips are analyzed in [101].
Additionally, the IT load performance based SLA-aware indices are also used for the software-based solutions in data centers. The SLA-aware indices i.e., Performance Degradation Due to Migration (PDM), Service Level Aggrement Violation (SLAV) are applied to evaluate the performance of the IT loads with consolidated workloads in the cloud system [150], [151].

2) RELIABILITY INDICES FOR IPCS SECTION
The indices that are used to assess the reliability of the IPCS in the data center are compiled with a logical explanation in [136]. The authors specify five different reliability indices in [136] i.e., MTBF, MTTR, availability, severity and risk (measured in terms of financial losses caused by the failure) for assessing the reliability of the IPCS in data centers. These indices are significantly impacted by the definition of ''failure'' that is used for the studied system architectures as explained in Section IV-B. The indices are also impacted by the size of the facility and the number of the critical loads used in the studied models [114]. A different reliability assessment approach is explained in [8], where authors showed the power supply capacity shortage probability of the PDUs due to increasing power losses in the IPCS that eventually results in server outages, hence failure in the IT loads. The index named ''outage probability'' is used for PDUs to relate with the service availability of the IT loads [8].
There are also reliability studies focused on the IPCS components (i.e., UPS, PDU, and PSU). These articles are out of scope of this review since the component based research is focused on lifetime enhancement, cost-effectiveness, and energy-efficiency of the particular component but not focused on data center applications.

3) RELIABILITY INDICES FOR COOLING SECTION
A reliability evaluation method for a hybrid cooling system combining with a lake water sink for data center is presented in [152], where the operational availability index A O (∞) is used for repairable system components. In [152] the operational availability index is defined as the probability that the system will be in the intended operational state, and mathematically expressed as a function of system's failure rate λ sys and repair rate µ sys , as given in (29). Another reliability index called functional availability A f (∞) is also used based on predicted server room temperature and servers' working conditions in [152], as given in (30). As explained in [152], the overall functional availability of the data centers cooling system is mainly determined by the operational availability, heat density, heat transfer characteristics of room temperature, start-up time of cooling system and repair time of cooling system failure.
Similar functional condition based analysis has been done for the data center air-conditioning system based on the airconditioning power supply capacity [153]. (30) where A O (∞) and A f (∞) are the operational and functional availability of the cooling system. The failure rate and repair rate of the cooling system are λ sys and µ sys , respectively. p us is the probability of room temperature out of intended range when the system is under operation state. p t is the probability of intended value of room temperature when the cooling system fails.
A different reliability modeling approach has been applied in [9], where authors emphasized on dependability of the cooling system since dependability is related to both fault tolerance and reliability. The reliability importance (I i ) and reliability-cost importance (C i ) indices are used in [9], as given in (31) where, I i is the reliability importance of component i; p i represents the component reliability vector with the i th component removed; D i and U i represent the failure and up state of i component, respectively. C i is the acquisition cost of the component i and C sys is the system acquisition cost. Apart from the mentioned indices, the typical indices like availability based on MTBF, MTTR, failure, and repair rates are widely used for reliability modeling of the cooling load section of data center [154], [155]. It is important to mention that the research on data center cooling system reliability is not adequately addressed yet, while the cooling infrastructure for commercial buildings has already drawn the interest of the researchers intensively in the last decade. The research on reliable cooling infrastructure of the data center is much needed since the temperature sensitivity of the data center's server hall needs to be compared to the other building facilities [154]. One of the very few articles that have critically evaluated the reliability of the cooling system of data center recently is [154].

D. METHODOLOGIES USED FOR RELIABILITY MODELING
Different research methods have been used for reliability modeling of the data center's load sections individually and also the data center as a complete system. All the proposed methodologies could be classified in two groups i.e., analytical research group, and simulation based research group [156].

1) ANALYTICAL APPROACHES FOR RELIABILITY ASSESSMENT
The applications of analytical approaches like Reliability Block Diagrams (RBD) and fault tree analysis are very common in data center reliability modeling because of the simplicity and less requirement of the computational capacity. One of the earliest of such research was published in 1988 [157], where the authors analyzed and compared the unavailability of the distributed power supply system of a telecommunication control room with the centralized power distribution system. A similar analytical approach has been explained in [158], where the reliability of the typical Alternate Current (AC) distribution system is compared with the Direct Current (DC) power distribution system in data centers using RBD. The failure of the power distribution system is only considered without considering the failures of the IT loads in [158], [159], while the availability of the IPCS considering the failure probability of the IT loads including PSUs are presented in [8]. Depending on the voltage level in the IPCS the reliability of different IPCS structures are evaluated using RBD in [160], [161]. The reliability modeling of the computation resource infrastructures (IT load section) of data centers has been conducted using RBD model in [162], [163]. A similar type of analysis is presented in [164], where the authors have used the directed and undirected graphs using minimum cut sets. The analytical approach is also applied to evaluate the reliability of the data center's network topologies by applying the concept of cut set theory [165], [166], and optimizing the resource allocations for reliable networks [167]. The analytical approach i.e., the RBD, stochastic Petri net and energy flow model are used for reliability assessment of the IPCS in [168]. An extended RBD model is proposed in [169] that can consider the dependency of the IPCS components' reliability on the overall reliability of the IPCS. The proposed model is called Dynamic RBD, which is further compared with colored Petri net model in order to perform behavior properties analysis that certifies the correctness of the proposed model for IPCS reliability, as explained in [169]. The fault tree analysis technique is used to estimate the failure rates, MTBF, MTTR, and reliability of different UPS topologies in [170]- [172].
The RBD is also used for data center cooling system reliability analysis in [9], [152]. The availability of a watercooled system is evaluated using maximum allowable downtime in the proposed RBD model in [152], while the RBD and stochastic Petri net model are used for quantification of sustainability impacts, costs, and dependability of data center cooling infrastructure in [9]. A comparison of data center sub-systems' reliability is presented in [148], where the reliability of the network, electric, and thermal system of the data center is modeled using the failure modes effects with criticality analysis (FMECA) and energy flow model (EFM). The proposed methodologies in these mentioned articles are evaluated using the components' statistical failure and repair data. There are common sources of these data for industrial applications like [173], [174]. However, the infrastructures of the data centers are more critical than typical buildings and industries as argued in [154]. The statistical data of the data center's component failure is needed for further research to improve the competent reliability in data center application. There is a publicly available data set that publishes the failure and repair times of the servers [175], while the failure and repair data of other components (i.e., PDUs, PSUs, cooling devices) are not part of any publicly-available set of data. The data center operator's tendency to hold the confidentiality and secrecy of the internal information of the data centers are the main reasons behind the lack of such data sets [176].

2) SIMULATION-BASED APPROACHES OF RELIABILITY ASSESSMENT
Along with the analytical models, the probabilistic modeling approaches are also common for data center reliability assessment. The state space models including Markov model and Markov chain Monte Carlo (MCMC) are used for reliability modeling of large scale and repair-able systems, therefore the application of Markov models have become popular for reliability modeling of data centers recently [177], [178]. To avoid the time-variant non-linear state space model in Markov model, the failure and repair rate of the components of the studied systems are assumed to be constant. The failure and repair rate could be constant for a component if the aging effect is ignored considering a constant failure rate [135]. Therefore, the simulation based reliability models for assessing the reliability of the data centers are widely used in research nowadays.
Monte Carlo is one of the most used simulation-based approaches for data center reliability modeling. The Monte Carlo simulation approach is mostly used to generate timedependent failure and repair events of the system components using probability distribution function, and observe the overall system performance based on the stochastic data [156], [179], [180]. The Monte Carlo simulation method is also used for reliability modeling of the components that are used in data centers i.e., UPS [181], [182], optical network system [180]. In simulation-based approaches the failure model of the system's component is important since the simulated result of overall reliability can vary depending on the failure mode, especially for the high reliability application like the data center [8]. As an example, the availability of the Tier IV data center is required to have five to six 9's, which means very few failure events will be observed in a million stochastic events. Therefore, accuracy in failure mode consideration and component's failure modeling are important in the simulation-based approaches for reliability modeling of data centers. Apart from the number of samples and failure mode of the components, the probability distribution functions of the failure and repair events of the components in the studied system also play a crucial role in the simulation-based approaches in case of reliability modeling. The probability distribution functions and the applications of the distribution functions for reliability modeling of the servers in the data center are analyzed in [183]- [185]. The distribution function of the failure and repair time of the network devices and other server components i.e., hard-disk, memory, and network cards are presented in [176], which is further used for reliability modeling of the overall system. Besides Monte Carlo, stochastic petri nets [32], and Markov chain Monte Carlo (MCMC) [186] are also used for reliability modeling of the data center.

E. DEPENDABILITY OF THE DATA CENTER LOAD SECTIONS AND SUB-SYSTEMS
The dependability of a system is defined as the ability of the system to deliver the service that can justifiably be trusted [187]. Alternatively, providing the criterion for deciding if the service is dependable is the dependability of a system [188]. As an example, the dependence of system A on system B represents the extent to which system A's dependability is (or would be) affected by that of system B. The dependability of a system can be represented by attributes i.e., availability, reliability, integrity, maintainability, etc [188]. This section analyses the dependability of the data center sub-systems and load-section since the service availability of the data center depends on the continuity of the services provided by the components of the sub-systems, as explained in Section IV-C.

1) DEPENDABILITY ON THE COOLING LOAD SECTION
A dependability analysis of data center sub-systems has been presented in [148], where the authors considered the availability of the three major sub-system (electric, cooling, and network) and also evaluate the impact of sub-system's availability on the data center reliability. The impacts of the electrical and thermal subsystem's availability on the overall reliability of the data center are presented in [32]. The impact of the ambient temperature on the overall reliability and energy efficiency of data centers has been analyzed in [147]. It has shown that the battery life in the IPCS is reduced by 50% due to increase in operational temperature by 10 o C; while the passive elements in the servers like capacitors' reduce the life time by 50% for 10 o C increment in the temperature [147]. The author in [147] has also concluded that increasing the data hall temperature improves the energy efficiency but it impacts the reliability of the servers and the PSUs in the IPCS. The power consumption of the cooling loads depends on the servers arrangement in data hall, hence, dense server arrangement causes high energy consumption by the cooling loads [32]. Additionally, the network and storage latency increases due to have overloaded cooling loads and have more un-utilized or idle servers, which also impact the overall relaiblity of the data center placement strategies, as explained in [32]. However, these articles have not considered the power losses of the IPCS to evaluate the reliability of the data center.

2) DEPENDABILITY ON THE IPCS
The impacts of the power losses on the service availability of the IT loads of the data center are analyzed in [8]. The service availability of the severs, hence the IT services of the data center is quantified considering the total power losses in the IPCS. According to [8], the server outage possibility could be 20% of the installed capacity from the system because of the power loss of the PDUs in the IPCS. Moreover, the impacts of electrical faults and unwanted outages in the IPCS on servers' outages in data centers are presented in [101], [113]. The faults in the IPCS causes voltage dips and leads to trip the PSUs and the servers, as explained in [101]. The amount of workload that cannot be handled for such unwanted failures are quantified in [113], since the failure could cause almost 33% of the insulated servers to be out of order in extreme cases [101]. The reliability-centric dependability analysis is further extended to control the computational resources to reduce the overall power consumption, hence the number of servers in the data center by balancing and scheduling the workloads in [189], [190]. The term ''right-sizing'' is used in this regards, although right-sizing is also used for reducing the number of idle servers based on data traffic and negotiated SLA in [191]. The number of active servers is optimized by workload consolidation through virtualization as proposed in [192], [193]. A different approach is presented in [113], where the authors address the required number of servers per rack considering the workloads and stochastic failure of PSUs in the IPCS. The broader aim of these analyzed articles is to improve the reliability and energy efficiency of the data center, here the consumption models of the load sections are necessarily used. The energy consumption models are used either for internal structural modification to reduce the power losses [189], [190] or for allocating servers to minimize the consumption [191]. Therefore, the trade-off between energy efficiency and reliability enhancement in the data center is important to be considered in data center operation, where the energy consumption models of the data center load sections are often necessary for data center reliability assessments.
A summary of the data center reliability analysis with the references is given in Table 9.

V. LIMITATIONS AND FUTURE WORKS
This paper does not consider the energy management techniques that are used for improving the efficiency of the data center, whether the authors are focused on the energy consumption models of the data center's major components. Moreover, the adaptation of the sustainable and green energy sources in data centers are the novel challenges in data center operation. The impacts of the green technologies i.e., renewable energy generation and free cooling techniques in the energy consumption modeling approaches are not addressed in this paper. The adaptation of the sustainable energy sources in the data centers and its impacts on the reliability of the data center will be analyzed in future.
The detailed mathematical models of different simulation methods i.e., Monter Carlo, Markov Chain Monte Carlo, stochastic petri nets, etc. are not included in this review. These models are used as a tool for data center reliability assessment. This paper only reviews the applications of these models in the simulation-based reliability assessment techniques of the data center without considering the mathematical models, which could be considered for further study.

VI. CONCLUSION AND RECOMMENDATIONS
Being the backbone of today's information and communication technology (ICT) developments the energy-efficiency and higher reliability of the data centers are needed to be ensured in data center operation. In this paper the energy consumption modeling and reliability modeling aspects of the data centers are reviewed. The review has revealed the state-of-the-art of the aforementioned topics and the research gaps that exist in published review articles. This paper contributes to fill the research gaps related to data center energy consumption modeling by analyzing the energy consumption models of data center load sections, which will ease the models application in further research. It is worth mentioning that this paper reviewed the data center's reliability assessment models and methodologies for the first time, which also shows the existing research gaps as recommendations. The identified research gaps, hence the recommendations based on the analysis of data center reliability assessment review are needed to be filled by the future researcher to ensure the adaptation of new equipment and technologies in the data center. Additionally, it has been revealed that the energy consumption models of the data center components are often necessary for the data center reliability models, although the energy consumption models have also other applications (summarized in Table 1) for the data center energy management.
This paper recommends based on the review of the energy consumption models of data center components to emphasize more on the availability of the energy consumption model parameters and variables than the accuracy for applying in the research. The higher accuracy of such models often makes the application complicated and could not contribute much to the improvement of the proposed methodology. Additionally, the lack of research on the energy consumption modeling of the internal power conditioning system's (IPCS) equipment is identified in this review. The total power consumption of the IPCS could be rich up to 10% of the total demand of the data center, which could also cause outages and reliability issues in data centers. This review also contributes to show the relation between the power consumption and the reliability of the data center, and concludes more research should be conducted to reduce the power consumption specially in IPCS section, as a recommendation.
The data center reliability modeling aspects are reviewed in this paper that shows a need of standard code for data center operation along with existing tier classification, which is mentioned as recommendation. The analysis also contributes to show the state-of-the-art of the analytical and simulation-based reliability modeling approaches that could help future researchers to choose suitable models based on application. The analysis has shown the need of statistical failure and repair data of the data center components that is rarely available due to the operator's lack of willingness to share. Therefore, it is recommended to publish the component's statistical failure and repair data so that it could be used for further research. It is also a recommendation of this paper to give more focus on improving the cooling section reliability analysis and analyze the dependency of the data center's overall reliability on other load sections more in details. In its essence, this review has identified a few research gaps and a number of recommendations for the researcher to continue the research and improve the understanding of the data center's energy consumption and reliability modeling.