Identify, analyse and mitigate—Quantification of technical risks in PV power systems

Technical risks are important criteria to consider when investing in new and existing PV installations. Quantitative knowledge of these risks is one of the key factors for the different stakeholders, such as asset managers, banks or project developers, to make reliable business decisions before and during the operation of their PV assets. Within the IEA PVPS Task 13 Expert Group, we aim to increase the knowledge on methodologies to assess technical risks and mitigation measures in terms of economic impact and effectiveness. The developed outline provides a reproducible and transparent technique to manage the complexity of risk analysis and processing in order to establish a common practice for professional risk assessment. Semi‐quantitative and quantitative methodologies are introduced to assess technical risks in PV power systems and provide examples of common technical risks described and rated in the new created PV failure fact sheets (PVFS). Besides the PVFS based on expert knowledge and expert opinion, an update on the statistics of the PV failure degradation survey is given. With the knowledge acquired and data collected, the risk and cost–benefit analysis is demonstrated in a case study that shows methods for prioritising decisions from an economic perspective and provided important results for risk managing strategies.


| INTRODUCTION
PV risk analysis serves to identify and reduce the risks associated with investments in PV projects.The key challenge in reacting to failures or avoiding them at a reasonable cost is the ability to quantify and manage the various risks.There are several interpretations of the concept of risk, but in general, risk can be defined as the probability of failure multiplied by the consequences of its failure.The common approach in evaluating technical risks is to apply a classical failure mode and effects analysis (FMEA). 1 It is widely used in the automotive, aerospace and electronics industries to identify, rank and mitigate potential failures.Root causes and impact of a failure can be analysed.
The disadvantage of this approach is that the risk is evaluated in a qualitative way and cannot provide a framework for the calculation of the economic impact.Thus, a cost-based FMEA was proposed in 1993 2 and enhanced in 2003. 1 Several applications of cost-based FMEA can be found in the literature, 3 often related to automotive or wind energy. 4In 2017, a cost-based FMEA was presented within the Solar Bankability Project 5,6 as a first attempt to implement a costbased FMEA to the PV sector.The metric cost priority number (CPN) was applied as one key performance indicator (KPI) for the risk assessment of PV investments.In Oviedo Hernández et al., 7 the CPN method was further developed with the focus on the needs of large O&M operators.In the 2020 launched H2020 project TRUST-PV, 8 the improved cost priority number methodology is used to calculate the cost and thereby financial impact of individual PV system issues described in O&M tickets, which were provided by several major O&M companies and asset managers across Europe. 9 this context, another widely used quantification method is the reliability, availability and maintainability (RAM) analysis.The RAM analysis aims to identify any significant performance losses and then recommend improvements to the maintenance strategy.In IEC TS 63265 -"Reliability practices for the operation of photovoltaic power systems", coordinated by Roger Hill, with the foreseen publication in 2022, another toolkit will be provided of many methods of how different stakeholders can demonstrate the effective of reliability increasing measures from technical and economic point of view.
The aim of this work is to increase the knowledge of methodologies to assess technical risks and mitigation measures and to investigate the most important risks by collecting case studies and updating the database with the acquired information.
As a first step, we reviewed scientific literature and technical reports to compare and evaluate the following methods for quantifying the impact of technical risks.a. Failure modes and effects analysis (FMEA) b.Reliability, availability and maintainability (RAM) analysis c. Cost priority number (CPN) method The second part deals with the systematic approach to identify the most important technical risks.The risk database includes the range of affected components, the description of causes and consequences, failure rates, probability of occurrence, the impact on KPIs and the recommended control and mitigation actions.Several examples of common technical risks have been described and assessed in the newly developed PV failure fact sheets (PVFS).The taxonomy is based on the TRUST-PV Risk Matrix, a technical risk matrix developed under the H2020 project of the same name and the previous H2020 project Solar Bankability.The risk matrix covers all failures and events in the operational phase of a PV plant and is subdivided according to components, subcomponents and individual events/failures.It can be downloaded under https://trust-pv.eu/reports/risk-matrix/.
In addition to the PVFS collection, we have also updated the PV failure degradation sheets (PVDS) presented in Köntges et al. 10,11 These require more detailed measured input data but are able to provide statistics on degradation rates and power losses of PV systems based on failure types.These statistics serve as a basis for risk models that can be used to assess the associated risk and the economic impact over the project-lifetime of a PV plant.In addition to the knowledge of the individual risks, the economic impact of these risks are the driving factors for further analysis and decisions.
In a final step, we included the costs of mitigation measures in a cost-benefit analysis to determine the best strategy from a technical and financial perspective.With the acquired knowledge and collected data, the risk and cost-benefit analysis is demonstrated in a case study that shows methods for prioritising decisions from an economic point of view and provide important results for risk management strategies.

| COMMON PRACTICE
According to the Project Management Body of Knowledge (PMBOK) guide, a set of standard terminology and guidelines for project management, 12 "Risk quantification is a process to evaluate identified risks to produce data that can be used in deciding a response to corresponding risks."This implies that the first step is to identify the technical risks and subsequently determine the probability of occurrence and the impact on the energy yield.Previous works within IEA PVPS Task 13, 10,11 Moser et al. 5 and the PV failures fact sheet in Chapter 3.1 have identified and described the most common technical failures that could impact the performance of a PV power plant.In addition to failures, there are also other technical risks during operation caused by varying performance loss rates as analysed in previous studies. 13,14How to respond to these risks with preventive or corrective actions is discussed by Jahn et al. 15,16 In the following, these evaluation processes are classified into semi-quantitative and quantitative methods with a focus on photovoltaics.This chapter gives an insight into common methods used, how technical risks in PV plants can be evaluated and minimised and provides recommendations for best practices.
The semi-quantitative methods use human problem-solving strategies, based on expert knowledge and expert opinion.The best ways to use such a knowledge-based method is to conduct online or offline workshops where experts can discuss and consequently assign values to the risks identified.They can prioritise the identified risks using a pre-defined rating scale.Risks will be scored based on their probability or likelihood of occurrence and their impact.

| FMEA
One typical approach is a classic failure modes and effects analysis. 17 the FMEA, each identified risk is evaluated for its Severity (S), Occurrence (O) and Detectability (D), which describe the impact on performance, the probability of appearance and how easy it is to detect the risk.Each of these evaluation parameters is usually rated with numbers on a scale of 1 to 10.
With the resulting risk priority number (RPN), the evaluated risk can be ranked and compared with other risks.Figure 1 gives an example of FMEA rating of PV module failures.The disadvantage of this approach is that further usage, that is, within a financial model, is limited. 5

| CPN
Quantitative methods involve assessing the probability and impact of risks using numerically based techniques, such as simulation and fault tree analysis.The results provide information about the effects of the identified risks and represent a given reality in the form of a numerical value that can be utilised in economic and financial models for quantitative decision making.
The CPN was developed in the early 2000s to address the fact that FMEA was unable to be used for quantitative financial assessments.Therefore, cost-based FMEA was proposed.The FMEA community had already developed the risk priority number (RPN).When full lifecycle analysis 18 of large projects, such as the "Next Linear Collider," were being designed and priced, full lifecycle costs, considering not just construction, but also O&M, repairs, loss of production time and FMEA needed to be taken into account. 19In 2003, 20 this was formalised as cost-based FMEA, 1 as an extension of the RPN used previously. 3And in the ensuing years, the utility of connecting FMEA to lifecycle costs and financial decision making was introduced in many engineering fields, 4,21 with Kahrobaee et al. 22 introducing CPN in a lifecycle and FMEA analysis of wind turbine systems.
For PV systems, the CPN enables accurate economic quantification of PV degradation modes and other performance impairing effects of operating PV plants.It therefore has enabled risk assessments of investments in PV power plant projects. 5The CPN methodology used assessed the economic impact of PV projects based on factors such as performance loss and downtime.Thereby, a costbased failure mode and effect analysis methodology for the PV sector has been developed in form of the CPN.In its initial form, it was developed using theoretical scenarios to calculate extreme values for the CPN metric, expressed in €/kWp/year.Thereby, all phases of a PV power plant's life cycle (from product testing to decommissioning) have been included.The methodology helps to identify and classify technical risks and their economic impact by assigning a cost metric that, based on collected statistics, supports preventive and corrective measures, which would then lower the impact of failures on the availability and performance of a PV plant.Thereby, it was possible to create a database which gives indicators of failure appearance likeliness and severity.Such results could then be used to improve O&M activities.

| RAM
Technical risks and the reliability of a component are complements of each other, as long as they cover the same sample space.In this context, another widely used quantification method is the Reliability, Availability and Maintainability analysis.RAM analysis aims to identify any significant performance losses and then recommends improvements to the maintenance strategy.In this bottom-up approach, a reliability block diagram (RBD) or the fault tree analysis (FTA) is recommended to determine the effects of the failure of individual components (Figure 2).
In RAM modelling, the reliability term R is defined as the probability that a system or component performs adequately within a given time.
The probability density function PDF of failures f(t) with increasing lifetime is expressed by an exponential, normal, Weibull or lognormal distribution.Weibull distributions are applicable to a broad range of failure modes and mechanisms.The normal distribution is preferred for items that have a wear out mechanism such as bearing or motors.
Derived from Sayed et al., 24 the best-fit PDFs for the different components are shown in Table 1.
The failure rate λ is the frequency of component failure.The mean time to failure (MTTF) of a component defines the expected life of non-repairable items.
Example of rating of PV module failures based on classic FMEA.The rating of the technical risks was based on the statistics of failure reports from TÜV Rheinland.RPN is the product of S, O and D where each factor is an integer between 0 and 10. 5 Availability (A) is defined as the percentage of time that the plant was successfully operating.A is the MTTF divided by the total operating time and can be calculated with MTTF and Mean Down time (MDT), as follows:

| RISK DATABASE
According to the PMBOK guide, 12 the Risk Database (RDB) is the central repository for all information regarding the identified risks.In terms of technical risks, the RDB provides the range of affected components, the description with causes and consequences, failure rates, the probability of occurrence, the impact on KPIs and the recommended control and mitigation actions.It should be updated and maintained as a growing data hub through all phases of the project.In this chapter, we present a systematic approach to identify the main technical risks, define the most important risk parameters and collect these failure, loss and occurrence data.

| PV failure fact sheets (PVFS)
The PV failure fact sheets (PVFS) summarise some of the most important aspects of single failures.The target audience of the PVFS are PV planners, installers, investors, independent experts and insurance companies and anyone interested in a brief description of failures with examples, an estimation of risks and suggestions of how to intervene or prevent these failures.Front delamination (Figure 3) is taken as one example of the 30 created PVFS.The complete list with all PVFS (Table 2) are publicly available in the Annex of Herz et al. 25 The format of the PVFS is based on the failure description presented within the H2020 Solar Bankability project. 6A rating system for the estimation of the severity of a failure is used here, which simplifies the approach proposed within the IEA PVPS Task 13 10 by implementing the rating system proposed by the Sinclairs. 26The correlation between the different failures is highlighted in the text by using bold characters.Each PVFS is structured into one to three pages.The first page is a descriptive page, whereas the remaining pages (Figure 4) contain examples composed of a picture, a legend and an estimation about its severity.The first page is structured as follows:

Component
The PV system components are divided into

Impact
Description of the impact on the safety, performance and reliability of the component and system and its severity.For every failure, a range of possible ratings is given, one for the safety and one for the performance.
A failure is defined as a safety failure when it endangers somebody who is applying or working with PV modules or simply passing the PV modules.Three categories are defined in Figure 5.

F I G U R E 3 First page of PVFS example with general information
A failure is defined as a performance failure when it impacts the performance and/or reliability of a system.Five categories are defined in Figure 6.They go from 1 (low severity) to 5 (high severity).
For each category, the expected loss is estimated on the component level and if no mitigation measure is implemented.It can range from no power degradation (0%) over power degradation below detection limit (<2-3%), power degradation within warranty (<0.7-1%/year) and power degradation out warranty (>0.7-1%/year) to catastrophic power degradation (>3%/year).

Mitigation
Description of the corrective actions to be done on a short and medium term when detecting a failure and preventive actions to be implemented to avoid the failure from the beginning.Preventive actions are separated into recommended actions, representing the minimum requirement for small residential systems and optional actions for large scale systems.
The general rule for intervention in case of a failure is: All components with a direct safety risk or a performance severity of 5, highlighted in red, should be replaced or repaired.Regular inspections should be performed to monitor the status of the not replaced or repaired components.

Example PVFS: Front delamination
The delamination of the encapsulant FS1-3: Front delamination is here taken as an example to further explain the FS structure and rating system.
The first section of the sheet describes the appearance or how to recognise a specific failure and which detection methods are available.
Delamination is generally easily detectable by visual inspection (VI) of the modules from the front.Insulation measurements (INS) can give a hint of a severe delamination, but it is not the first method to detect an early delamination, which is the reason why it is put in brackets.
The second section describes the origin or in which phase of the lifetime of a PV system the failure occurs and what the main causes are.Delamination problems have its origin mainly in the quality of the raw material, the manufacturing process and/or the environmental factors to which the modules are exposed during its operational lifetime.Transport and installation do not generate any delamination problems.
The third section describes the impact the failure has on the safety and performance of the component and PV system.Below the general description, the severity rating according to Figures 3 and 4 is given.The severity rating in the first page gives the full range of possible ratings observable in the field and how the failure can evolve over the whole lifetime of a PV system.Instead, the rating in the examples gives a snapshot of the gravity of the failure for a specific case at a certain time.The pictures are taken from literature or case studies and give only a partial picture of the situation and are used here to explain the potential levels of impact.

| PV failure degradation sheets (PVDS)
Besides the PVFS collection, we provide an update on the statistical risk data of the PV failure degradation sheet (PVDS) survey developed in Koentges et al. 11 It requires a large amount of measured input data but it is able to generate statistical data on degradation rates and power loss of PV systems based on failure types.Due to the high requirements on the PVDS much less input data can be collected.In the following, we introduce the collected data, the way of analysing the data and the analysis results.
The failure data are collected in an excel sheet which is sent to system owner, experts, installer or manufacturer.Some data are also F I G U R E 4 Remaining pages of a PVFS contain examples composed of a picture, a legend and an estimation about its severity

Safety category Description
Failure has no effect on safety.
Failure may cause a fire (f), electrical shock (e) or a physical danger (m) if a follow-up failure and/or a second failure occurs.
Failure can directly cause a fire (f), electrical shock (e) or a physical danger (m).

F I G U R E 5 Safety category
The defect has a moderate impact on performance.
The defect has a high impact on performance.
The defect has a catastrophic impact on performance.

Performance category Description
The defect has no direct effect on performance.
The defect has a minor impact on performance.
F I G U R E 6 Performance category collected by scientific publications or an Australian internet survey.
The survey structure is first presented in the IEA PVPS TASK 13 report "Assessment of Photovoltaic Module Failures in the Field." 11The plain survey and the survey explanation can be downloaded here. 27e survey is structured into system components, as described in Chapter 3.1.All system components may have various predefined failures.For each failure, a power loss and a safety failure may be given.
Furthermore, for each system, a Koeppen-Geiger climate zone must be selected.The Koeppen-Geiger climate zones shift during the ongoing climate change.We used the Koeppen-Geiger map calculated by Rubel 28 for the time period 1976-2000 as classification classes.
Compared with the first presented survey structure in Köntges et al., 11 we added two new failure categories for PV modules: LID/LeTID degradation and potential induced delamination. 29Furthermore, it is now possible to add all three letters of the Koeppen-Geiger classification to the survey compared to one in the first version.The translation tool for "geo data" to "Koeppen-Geiger climate zones" 30 helps to find the correct classification for each position in the world.
Since the last failure data evaluation, 11   type "burn marks" has been detected more frequently.For sudden events, also shown in Figure 8, the failure glass breakage and dust soiling fully dominate the failure statistic.
Figure 9 shows the power loss impact of sudden events on PV system performance.Documented glass breakage events lead in temperate climates to a loss of 1% to 2% of a system's power, with one exception in the dataset.These events seem to occur everywhere but appear to be not so severe for the whole system.Dust soiling appears everywhere except for tropical climates.In temperate climates, the impact is at a maximum 7% of the total system power whereas up to 15% power loss occurs in dry climates and over 25% for continental climates.As expected, the deformed PV module frame due to snow load occurs only in the continental and polar climate.F I G U R E 9 Power loss of sudden failure events on the total power of the PV system

| Risk analysis
Risk analysis enables users with statistical and reliability data to develop and run scenarios in which PV performance and costs are affected by components that can fail.
How the risk quantification method can be also applied in practice is demonstrated using a 10 MW PV plant with PID affected PV modules.The assumptions in Table 3, derived from an operating PV plant, serve as input for this case study.Not considered are financial parameter such as depreciation, interest or taxes.

| Cost-benefit analysis
The CPN methodology allows the estimation of the economic impact of failures on the LCOE and on business models of PV projects and has been developed not only to determine the economic impact of technical risks, but also to be able to assess the effectiveness of mitigation measures.Specific failures have to be examined in order to draw recommendations on how to mitigate the economic impact for, for example, soiling, or potential induced degradation (PID).Some failures can be prevented or mitigated through specific actions at different project phases (e.g., for PID); other events (e.g., soiling) can be prevented or mitigated through a more generic action.For example, the monitoring of performance or visual inspection can be considered as generic mitigation measures that can have a positive impact on the reduction of the CPN of many failures.In practice, it is important to understand how mitigation measures can be considered as a whole to be able to calculate their impact and thus assess their effectiveness.
The cost-benefit analysis is also a tool to determine whether the benefit of one option will justify its costs.It can identify the best miti- The expected annual production of energy yields for the three scenarios is illustrated in Figure 13.After the mitigation measures were applied in year 5 of operation, the energy yields show a steep rise.
The expected PV plant output after 20 years of operation is calculated at 45% rated energy output for the no-mitigation scenario and at 84% and 91% for mitigation options 1 and 2, respectively.
The cost-benefit analysis also takes the associated costs of the available options into account, as described in Table 4.In order to answer these questions, we introduced semi- However, providing an overview of quantification methods, we draw the conclusion that more standardisation is required.Risk definitions are not fully structured and event databases (solar logbooks) are not harmonised.Data analysis would benefit from the use of a standardised language and metadata formats.The development of an automated and therefore time-efficient solution for extracting key parameters from maintenance tickets is required to gain statistical insights from a large number of PV plants.Also, the development of a software tool for field technicians is recommended that would allow the precise and error-free recording of standardised parameters for the calculation of the O&M contractors KPIs necessary for an efficient implementation of the methodology. 7In summary, the O&M field practices must certainly move away from a manual input of tickets in text format and adopt a more standardised approach where human intervention is limited.
In the 2020 launched H2020 project TRUST-PV, 8 the improved cost priority number approach is the basis for the creation of a large database including PV system data, coming from several major O&M companies and asset managers across Europe, for failure rates calculation.It is thereby a direct continuation where the improved cost priority number methodology will be automatised in terms of acquiring failure data, power loss calculations and related cost determination.
The output will later be integrated in the PV plant design of newly commissioned PV plants and in a decision support system platform for operating plants.All things considered, we believe that data-driven evaluation of techno-economic performance indicators is a significant key to take decision support on LCOE to the next level.

3 .
Mounting (structure, clamps and screws) 4. Inverter Defect Short name describing the failure/defect.Appearance Description of how the defect looks like.Detection Description of methods that can be used to detect the failure.Detection methods in brackets lists secondary methods, which do not detect the failure with absolute certainty or which can be used in addition to other methods.The following abbreviations are used: Origin Description of the failure and its main causes and origin (1.Material and production, 2. Transport and installation, 3. Operation and maintenance).

Figure 7 .
Figure 7.Most data is from Europe.In total, data from all six continents are available.Although the market share of mono-and multicrystalline silicon solar wafers has switched from the multi market domination to a mono market domination, the main analysed technologies are still multi-crystalline silicon wafer based solar cells.In the data collection, PV systems are included with installation year beginning from 1982 to 2018.Over 90% of the data are from PV systems in-stalled in the range of 2005 to 2018.

Figure 8
Figure 8 shows the frequency distribution for PV module failures with an impact on the power generation of the PV systems.The distribution is split into failures that lead to a degradation and suddenly occurring failures.Most reports on failures with power loss are given

Figures 10 and 11
Figures 10 and 11 show the degradation rate for the affected system parts and the whole system for various failures sorted by climatic zones.The additional data supports the former statements for the degradation rates of the failure types presented inKöntges et al. 11

F I G U R E 1 0
Box plot of degradation rates dx of PV module affected by failures x sorted by climatic zones.The numbers show the quantity of data per failure in the database.The cross shows the mean degradation rate.The boxes include 50% of all values, the whisker show the full range of existing values.The middle line in the box shows the median.F I G U R E 1 1 Degradation rates of the whole PV system sorted by climatic zones.The numbers show the quantity of data per failure in the database.Taking the behaviour of the identified root cause into account, the potential future performance loss rate (PLR) is expected to increase further with an expected saturation of 50%.After this value is reached, the PLR is expected to stagnate at a constant level of 0.7% per year.This prediction of performance development for 20 years of operation is shown together with the exceedance probability P10 and P90 for a confidence level of 68.2% in Figure12.Taking CAPEX, OPEX and annual revenues into account, the project's financial profit after 20 years of operation is 48% below original expectations for the defined scenario without mitigating actions.
gation options from an economical point of view.The analysis continues the case study presented in Chapter 4.1.Three mitigation scenarios are defined: • No-Mitigation option without intervening into the current plant operation • Mitigation option 1 -PID Box: Installing PID-boxes and allowing the performance of the PV modules to recover to a certain level • Mitigation option 2 -PID Box and partial repowering: Installing PID-boxes and replacing very low performing PV modules by highpower-modules.

2
The costs are derived from a real project and include basic O&M costs for the nomitigation scenario and the costs of additional detection actions and equipment for mitigation options 1 and 2 when installing PID-boxes or replacing PV modules.The impact on the annual cash flow is T A B L E 3 Metadata of investigated PV plant Energy forecast of no-mitigation scenario demonstrated in Figure14.In the reference scenario, the monetary yield of the PV project after 20 years is expected to be around 225% of the CAPEX (dashed line).If no mitigation measures are taken, the lowest result of around 115% of CAPEX is forecasted.Mitigation options 1 and 2 result in 6.0%, respectively 4.6% below expectations, which both represent successful projects results.It can be concluded that both mitigation options should be considered and taken as a solution compared to non-action.However, the additional investments in year 5 of operation for option 2 are with a factor of 8 significantly higher.5 | CONCLUSIONSBest practice guidelines to improve the operation of PV power systems are often only applied as long as recommended actions have advantages for the executors, the EPCs and O&M companies and for the investors whose main focus is on low risks and maximum profit from an economic point of view.This leads to the key challenge: How can you demonstrate the effectiveness of the measures and justify their application?The technical best solution is not always the best from an economic or safety point of view.Before you are able to evaluate the cost-benefit, the following question arises: How to quantify the basic impact of technical risks?
quantitative and quantitative methodologies to assess technical risks in PV power systems and provided 30 examples of common technical risks described and rated in the new created PV failure fact sheets.25Besides the PVFSs based on expert knowledge and expert opinion, an update on the statistics of the PV failure degradation survey, developed in Koentges et al.,11 was given.With the knowledge acquired and data collected, the risk and cost-benefit analysis wasDetecƟon ƟmeResponse Ɵme Repair ƟmeF I G U R E 1 3 Twenty-year forecast for three mitigation scenarios; the repowering is carried out with a higher module power class.T A B L E 4 Costs of mitigation scenariosNo-mitigation PID box PID box and partial repowering Cost (k€) 15 238 3233 CPN (5years) = 151€/kWp Revenue Loss = 1.51Mio€ 7.6% of investment CPN (10years) = 745€/kWp Revenue Loss = 7.45Mio€ 37.3% of investment F I G U R E 1 4 Annual cumulative cash flow of the mitigation scenarios with CPN and loss of revenue after 5 and 10 year of operation if no action is taken demonstrated in one case study that showed methods for prioritising decisions from an economic perspective and provided important results for risk managing strategies.
Technical risks from a reliability perspective, as introduced in the RAM analysis, are addressed in IEC TS 63265 -"Reliability practices for the operation of photovoltaic power systems", coordinated by Roger Hill with the foreseen publication in the 2022.Its motivation is to provide a toolkit description of many methods of how different stakeholders can demonstrate the effective of reliability increasing measures from technical and economic point of view.
24st-fit PDFs for the components of a PV plant adapted from Sayed et al.24 1. PV module (including junction box) 2. Cables and interconnectors (at module, string and combiner box level) F I G U R E 2 Examples of reliability block diagram (top) and fault tree (bottom).Adapted from Baschel et al. 23 T A B L E 1