Design and operation of urban wastewater systems considering reliability, risk and resilience

8 Reliability, risk and resilience are strongly relat d concepts and have been widely utilised in 9 the context of water infrastructure performance ana lysis. However, there are many ways in 10 which each measure can be formulated (depending on the reliability of what, risk to what 11 from what , and resilience of what to what ) and the relationships will differ depending on th e 12 formulations used. This research has developed a fr amework to explore the ways in which 13 reliability, risk and resilience may be formulated, i entifying possible components and 14 knowledge required for calculation of each and form alising the conceptual relationships 15 between specified and general resilience. This util ises the Safe & SuRe framework, which 16 shows how threats to a water system can result in c onsequences for society, the economy and 17 the environment, to enable the formulations to be d erived in a logical manner and to ensure 18 consistency in any comparisons. The framework is us ed to investigate the relationship 19 between levels of reliability, risk and resilience provided by multiple operational control and 20 design strategies for an urban wastewater system ca se study. The results highlight that, 21 although reliability, risk and resilience values ma y exhibit correlations, designing for just one 22 1 Corresponding author. Tel.: +44 (0)1392 723600; Email: C.Sweetapple@ex.ac.uk M AN US CR IP T AC CE PT ED ACCEPTED MANUSCRIPT 2 is insufficient: reliability, risk and resilience a re complementary rather than interchangeable 23 measures and one cannot be used as a substitute for another. Furthermore, it is shown that 24 commonly used formulations address only a small fra ction of the possibilities and a more 25 comprehensive assessment of a system’s response to threats is necessary to provide a 26 comprehensive understanding of risk and resilience. 27

(impact). However, there are further options (such as reliability of a specific system 227 component) and, given their common usage, it is useful to identify these too. 228 Using the Safe & SuRe framework and components identified in Figure 1, reliability can be 229 formulated in six ways, as detailed in Table 2. Not all reliability measures detailed are useful: 230 it is unclear what would represent a failure with respect to the society/economy/environment, 231 and formulations R5 and R6 are unlikely to be used in practice. However, provided failure 232 limits can be defined, reliability is theoretically calculable using any of the formulations 233 listed since it addresses only performance under standard loading (i.e. known knowns -any 234 event which is rare enough to be a known unknown or completely unknown is not considered 235  formulations (S1-S2, S4 and S7-S12 in Figure 3a and G2 and G5-G6 in Figure 3b). 263 Haimes (2009) argues that general resilience cannot be calculated, since it requires 264 knowledge of the response to any threat, but this is not always the case. Resilience cannot be 265 calculated under formulation G3 (resilience of society, economy and the environment), as this 266 requires knowledge of unknown consequences, and G1 and G4 (resilience of system and 267 resilience of specified system component) are also incalculable since not all threats which 268 may cause system failures are known. However, the framework presented illustrates that 269 general resilience can be calculated through a middle state based analysis, as in G2 and G5, 270 as both known and unknown threats result in the same known, finite set of system failure 271 modes. Take, for example, formulation G2, resilience of level of service. This may be 272 modelled as 'resilience of level of service to any threat', which cannot be calculated since not 273 all threats are known, but also as 'resilience of level of service to any system failure', which can be calculated as all the modes by which the system may fail are identifiable; what threat 275 (known or unknown) causes them is irrelevant since, by evaluating all system failure modes, 276 the potential effects of all threats are captured. Multiple threats can thus be addressed with 277 analysis of a smaller number of system failure modes. 278 Traditionally, resilience has focussed on the failure of assets; however, asset failure may not 279 necessarily affect level of service provision and may be irrelevant from a consumer 280 perspective (Ofwat 2010). This suggests that, although formulation S1 may be of interest to 281 the asset owners, an impact or consequence based approach (such as G5 or G6) is of greater 282 benefit. Similar applies to reliability and risk. 283 simply whether or not flooding occurs, irrespective of depth, is sufficient). However, it is also 300 argued that risk should provide a measure of the potential losses or adverse effects (Scholz et 301 al. 2011) and it is generally quantified using a function of event frequency and effect 302 magnitude (Blackmore and Plant 2008); this is the interpretation used in this work. 303 The following equation, adapted from the typical 'probability x consequence' to fit the 304 terminology of this study, is used here to represent risk. 305 In calculation of risk, the casual event probability and failure magnitude could be measured at 306 different locations: for example, when calculating the risk of a combined sewer overflow the causal event could be considered the storm event, whereas for the latter either the storm 310 event or the CSO could be considered as the causal event. Hence, for absolute clarity, it is 311 necessary to specify risk to what from what (i.e. where the effect is measured and the 312 potential cause of that considered). 313 For conventional risk calculation, both the probability of the causal event and the magnitude 314 of its effects (the failure) need to be known and measurable. Accordingly, Figure 3 illustrates 315 all potential combinations of 'failure' and 'causal event' within the Safe & SuRe framework 316 and identifies those which result in a calculable risk formulation. Similarly to resilience, risk 317 formulations in which a specific causal event is identified may be classified as 'specified 318 risk', and those which address risk from any (known or unknown) event can be classified as 319 general risk. Risk cannot be calculated in formulations which include unknown threats in the 320 causal events (i.e. the 'general' formulations, G1-G6 in Figure 3b) since, by definition, these 321 cannot be characterised; this is in contrast to resilience, where it is not necessary to know, for 322 example, the probability of the events that may result in failures. Similarly, risk cannot be 323 calculated in formulations that include unknown consequences in the measured failures (i.e. 324 formulations S3-S6, S9 and S11 in Figure 3a, and G3 in Figure 3b). Furthermore, risk cannot 325 be calculated if the required causal event probability or failure magnitude is unknown despite 326 the existence of the causal event or failure type being known (formulations S10 and S12 in 327 Figure 3a). This leaves five formulations (S1, S2 and S7-S9 in Figure 3a) under which risk 328

Reliability and risk 332
There is widely assumed to be a connection between reliability and risk. However, the nature 333 of this relationship is less clear. Some consider increasing reliability to be analogous to 334 decreasing risk, for example, with high risk equating to low reliability (Konstantinou et al. 335 2011). However, others consider reliability a contributor to risk, as it contributes to the 336 probability of failure, but is not the only component (Zio 2013). This corresponds with the 337 risk assessment approach of Kjeldsen and Rosbjerg (2004), and suggests that, although 338 increasing reliability may contribute to a reduction in risk, other factors must also be 339 considered. 340

Reliability and resilience 341
Reliability may be considered a prerequisite and/or a component of resilience (Butler et al. 342 2017, Francis and Bekera 2014), or alternatively a complementary performance indicator 343 (e.g. Kjeldsen and Rosbjerg 2004). As for risk, this suggests that increasing reliability may 344 contribute to efforts to increase resilience but additional measures are also required. 345 whereas resilience can be calculated under nine (including all for which risk can be 355 calculated). Risk cannot be calculated under formulations S4 and S10-12 (amongst others) 356 since these require knowledge of probabilities that cannot be determined (P(S K,x ) and P(I K,y )); 357 despite the event type being specified and known in these cases, its probability is not known 358 as it may occur as a result of unknown threats (the probabilities of which are not known). 359

Risk and resilience
Resilience can be calculated under formulations S4 and S10-12, however, since this does not 360 require knowledge of the probability of the event(s) resulting in failure. 361 Risk cannot be calculated under the formulations used for general resilience (G2, G5 or G6) 362 since knowledge of the probability of unknown threats is required in every case. Even if the 363 probability can be expressed as the probability of infrastructure failure or probability of level 364 of service failure (as in G6, for example), it is still affected by unknown threats and cannot be 365 calculated. To be calculable, risk must be specified. General resilience formulations could be 366 considered more useful for detailed system analysis, since they include a measure of the 367 response to any threat, including unknowns, but they are also more challenging to calculate 368 for this very reason. 369

Reliability, risk and resilience 370
Based on the definitions and discussion in Section 2.1, the conceptual relationships between 371 reliability, risk and resilience with respect to the probability and magnitude of events 372 addressed are presented graphically in Figure 4. Reliability concerns performance under 373 'standard loading', which will typically cover the relatively low magnitude, high probability 374 events which are expected to occur within the system's design life. Risk can address more M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT 20 events that are considered too unlikely to be assigned a probability with any degree of 377 certainty or events that cannot be foreseen. Resilience can address the same events as risk 378 assessment but, as it is not necessary to know the probability, can also consider the system 379 response to and recovery from much more extreme events (including so called 'black swans') 380 which, although highly unlikely, may occur. 381 382 Figure 4: Conceptual relationships between reliability, risk and resilience with respect to the 383 probability and magnitude of events addressed 384 2.3 Integrated urban wastewater system case study 385

Case study 386
The case study IUWS used (shown diagrammatically in Figure 5)

Reliability, risk and resilience assessment 445
A brief description of the assessment methodologies is provided here; further detail is 446 available in the Supporting Information. 447

IUWS reliability assessment 448
Reliability is assessed under standard conditions (i.e. no population increase) using Eq. 1, 449 where the probability of failure is based on the modelled level of service failure duration. 450

IUWS risk assessment 451
Risk is evaluated for population increases of 0 to 15% at 1.5% intervals, using Eq. noticeable with the R duration,mean resilience indicator). Therefore, these results demonstrate that, 501 when selecting design and operational control options for an IUWS, high reliability and low 502 risk are necessary criteria but not sufficient for high resilience; resilience must be considered 503 as a third and separate objective. 504 If, in this case study, no benefit of considering the three performance measures as separate 505 objectives had been found, this would not provide sufficient evidence to conclude that (in the 506 wider sense) reliability, risk and resilience do not all need to be considered in the design and 507 operation of IUWSs. However, the observation here that they cannot be used interchangeably 508 is sufficient to demonstrate that the highest reliability and lowest risk options do not 509 necessarily provide the highest resilience. greatest resilience alone is insufficient and reliability and/or risk must also be evaluated to 516 ensure that the chosen option performs well under a wide range of conditions, including 517 standard loading. This observation is particularly important when it is not possible to 518 implement the option providing the greatest resilience (e.g. due to cost restraints), as there is 519 greater range in risk and reliability for lower resilience options. 520 The different levels of resilience, risk and reliability provided by each option are attributed to 521 adjustment in the decision variables presented in Table 3. When analysing the options 522 providing a resilience (R duration,mean ) value of 0.85 (as above), the option providing highest M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT 29 a smaller volume of CSOs from subcatchments 1-4 and 7, as well as a greater volume of 525 wastewater being treated, thereby resulting in higher receiving water quality under standard 526 conditions. However, it only provides the same level of resilience as that provided by a less 527 reliable option with greater CSOs and less wastewater treated, whereas it would intuitively be 528 expected to provide higher resilience than a less reliable option. This may be attributed to it 529 resulting in a greater impact on level of service under extreme population increase as the 530 surcharged WWTP performs poorly and low quality discharge is concentrated at the WWTP 531 outlet instead of distributed along the river by CSOs. 532

Reliability-, risk-and resilience-based design 533
Most operational control and design options shown in Figure 6 do not represent realistic 534 solutions, given their poor performance even under standard loading / design conditions. 535 To illustrate the potential differences between reliability-, risk-and resilience-based design, 553 the decision variable values of the three different options are shown in Figure 8. The high 554 reliability option provides a receiving water quality compliance reliability of 1.000, risk to 555 receiving water quality from population increase of 0.023 and a receiving water quality 556 resilience to population increase (R duration,mean ) of 0.853. The high reliability and low risk 557 option has reliability, risk and resilience values of 1.000, 0.000 and 0.868 respectively, and in 558 the high resilience option the resilience (R duration,mean ) is increased to 0.922. It is shown that, 559 whilst there are similarities between the three options (most notably in Q maxin and V ST7 ), the 560 characteristics of the operational control and design option providing high resilience differ 561 from those providing just high reliability. For example, high reliability can be achieved with 562 an increase in storage volume of 23-31% (V ST2 , V ST4 , V ST6 and V ST7 ); however, significantly 563 greater increase in storage volume is required to provide the highest level of resilience. Note that observations on the relationships between reliability, risk and resilience in the 573 IUWS case study are based on a formulation of resilience that addresses only one known 574 threat. The capability of a middle-state based resilience assessment to address multiple 575 threats, including unknowns (as in formulation G2, for example), has not been exploited. The 576 benefits of a 'high resilience' approach over a 'low risk' approach are expected to be greater 577 if resilience is calculated using a formulation under which risk is incalculable (e.g. S4, S10 or 578 G2), but demonstrating the benefits is challenging if they are not observable until the 579 occurrence of a previously unknown threat. Even under risk and resilience formulation S2, 580

CONCLUSIONS 583
This research has explored the ways in which reliability, risk and resilience may be 584 formulated, identifying possible components and knowledge required for calculation of each 585 and formalising the conceptual relationships between specified and general resilience. A set 586 of corresponding formulations has also been implemented in a case study IUWS to enable 587 investigation into the relationships between reliability, risk and resilience for this system. The 588 following conclusions are drawn: 589 • Many formulations of both general and specified risk and resilience exist, but not all can 590 be calculated due to the existence of unknown threats and unknown consequences. 591 • General resilience can theoretically be calculated (under some formulations) whereas 592 general risk cannot. Resilience can, therefore, address responses to a wider range of 593 threats. 594 • All threats, including both known and unknown, can be addressed with a middle-state 595 based resilience analysis which focusses on the level of service response to system 596 failures. Risk cannot be calculated on the same basis since the probability of system 597 failure is affected by the probability of unknown threats. 598 • Consideration of resilience in addition to risk can be beneficial even when only 599 considering specified threats, as demonstrated in the case study. Lowest risk solutions do 600 not necessarily provide the highest specified resilience.