A systematic review of multivariate uncertainty quantification for engineering systems

Engineering systems must function effectively whilst maintaining reliability in service. Predicting maintenance costs and asset availability raises varying degrees of uncertainty from multiple sources. Previous reviews in this domain have assessed cost uncertainty and estimation for the entire life cycle. This paper presents a systematic review to investigate existing methodologies and challenges in uncertainty quantification, aggregation and forecasting for modern engineering systems through their in-service life. Approaches to forecast uncertainty here are hindered chiefly by data quality of available data, experience and knowledge. A total of 107 papers were analysed to answer three research questions based on the scope, through which two core research gaps were identified. An integrated combination of identified approaches will enhance rigour in uncertainty assessment and forecasting. This review contributes a systematic identification and assessment of current practices in uncertainty quantification and scientific methodologies to quantify, aggregate and forecast quantitative and qualitative uncertainties to better understand their impact on cost and availability to aid decision making throughout the in-service phase. © 2021 The Author(s). This is an open access article under the CC BY license (http://creativecommons.org/


Introduction
The increasing complexity and dynamic nature of engineering systems drives an inherently high level of uncertainty. Many such complex engineering systems (CES) consist of multiple component parts or subsystems that interact in a collective manner not representative of individual parts [1][2][3]. Examples of complex systems range from biological organisms, global climate and meteorology to bridges, ships and aircraft. Engineering systems are expected to carry out intended functions whilst maintaining reliability in service. It is therefore increasingly challenging to confidently predict availability, cost and performance in various operating conditions [1,4,5]. Decisions made concerning these factors are shrouded in uncertainty, requiring significant experience and expertise, as well as the use of often outdated equipment data. This is typically managed under through-life product-service system (PSS) contracts, where the client makes use of a product in their possession but does not take ownership [6][7][8][9][10].
This review is motivated by the requirement for scientific approaches to quantify, aggregate and forecast technical engineering uncertainties for complex and non-complex engineering systems. These uncertainties impact the ability to effectively carry out maintenance tasks given available techniques and technology to required industry standards. Examples include uncertainties in degradation, no-fault found, obsolescence and failure rates [11][12][13].
It is therefore hypothesised that the utilisation of the above approaches considering a multivariate combination of measured, recorded data (quantitative) and experience-driven opinion or human factors (qualitative) will increase confidence and rigour in determining the impact of uncertainty over time. Such approaches should be applicable for various scenarios where data may be incomplete, inconsistent, inaccessible, and reliant on expert opinion [14][15][16]. In the light of dramatically increasing data volumes and computational capability in engineering systems, rigorous machine learning algorithms should be incorporated to intelligently forecast uncertainty estimates [17,18].
Previous reviews in this domain have considered the role of uncertainty estimation in life cycle costing under PSS [14,[19][20][21]. The in-service phase covers the largest portion of an asset's lifecycle between contract bidding and disposal. Many approaches to aggregate different types of uncertainty consider a summation of best and worst-case scenarios represented by probability distributions to define boundaries for likely outcomes [14]. Inadvertently disregarding the space between these scenarios may result in under or over-estimation and data distortion, adversely impacting decision-making.
This paper presents a systematic literature review (SLR) to investigate distinct approaches in uncertainty quantification and aggregation that can be applied in a real-world context, in conjunction with how uncertainty can be forecast for the inservice phase for engineering systems. Both complex and noncomplex engineering systems are considered in this review, with a focus towards CES owing to their increasing relevance within the research scope. The objectives and resulting research questions (RQs) to achieve this are depicted in Section "Research definition". The review follows the 4-stage analytical framework composed by Booth et al. [22] to conduct an SLR: search, appraisal, synthesis and analysis (SALSA). This generic approach is well validated and can be applied under varying conditions to provide a clear analysis of literature published in the field of uncertainty and identify research gaps [23][24][25].
The primary contribution of this review is the combined consideration of scientific methodologies to quantify (numerical expression of an entity), aggregate (collation of entities) and forecast (likely future outcomes) quantitative and qualitative uncertainties to better understand their impact on cost and availability to aid decision making throughout the in-service phase. A total of 107 papers were analysed to answer three research questions, through which two core research gaps were identified.
The paper is structured as follows: Section "Topology of engineering systems and uncertainty" discusses a topology of engineering systems and uncertainty, including classification and recognised standards. Section "Research definition" defines the research scope and subsequent RQs for the review. Theory and implementation of the SALSA methodology fulfils Sections "Review methodology: Appraisal and synthesis" and "Analysis of synthesised data"; Section "Review methodology: Appraisal and synthesis" details the appraisal and synthesis of identified literature and categorisation of extracted data, while Section "Analysis of synthesised data" analyses the findings. Section "Research results and discussion" discusses the research findings parallel to the RQs. Section "Research questions contribution to knowledge" concludes the review and identifies research gaps and future work.

Topology of engineering systems and uncertainty
As stated above, a complex system is comprised of multiple component parts or subsystems interacting linearly or nonlinearly, exhibiting a collective behaviour that is distinct from and seldom predictable by that of individual parts or subsystems [1][2][3]. Conversely, a complicated system can be comprised of a myriad of interconnected parts but still exhibit a predictable collective behaviour [2,3]. Complex systems science is a rapidly expanding and evolving field, the theory of which is widely documented [1][2][3][4][5]26,27]. A complex engineering system (CES) is one that is focused on an engineering domain rather than, for example, social, biological or meteorological systems. The inherently complex and dynamic nature of CES manifests high levels of uncertainty. This takes shape in various forms including costing, policymaking, supply chains and technical uncertainties [3,4]. Technical engineering uncertainties within engineering systems set the context for this review, where uncertainty in the performance of one component or subsystem (node) may have knock-on effects with interconnected nodes or the whole system. The level of uncertainty can change throughout the in-service life of each node in an unpredictable and often non-linear manner [2,3]. This calls for adaptive and intelligent approaches to forecast uncertainty based on a combination of available data and expert opinion.
There are several definitions and interpretations of uncertainty in literature [28][29][30][31][32][33][34][35][36][37][38][39]. It is defined here as the difference between the amount of information that is required to perform a task and the amount of information already possessed. The relevance of information, or lack of, should be specified concerning the functionality of the organisation or application in question [40]. Uncertainty is caused by variability in the environment, human error and/or human ambiguity (e.g. lack of knowledge) and could cause a negative, positive or neutral impact on the overall performance [41].
The terms error and uncertainty are often used interchangeably. Risk is generally interpreted as purely negative impacts of uncertainty [29,32,35,37,42]. It is important to differentiate these concepts. A statistical error is the (unknown) difference between the retained (measured) value and the true value, following probability distributions. Measurement uncertainty is the lack of information about the magnitude of these errors. Risk is the positive or negative impact specific sources of uncertainty will have on the measurand (the system for which uncertainty is being assessed). The degree of uncertainty associated with the measurand can be utilised to aid decision making.
There are two key types of uncertainty described in literature: Type A, which are sourced from quantitative data; and Type B, which make use of qualitative technical and expert knowledge or experience [16,30,34,[43][44][45]. These are further explored in Section "Uncertainty propagation and simulation techniques". In the context of this paper, Type A will hence be referred to as 'quantitative' and Type B as 'qualitative'. Uncertainty can be further characterised as aleatory and epistemic. Epistemic uncertainties are those that could be known in principle but are not known in practice [46][47][48][49][50]. This may be due to inaccurate measurements or the measurement model neglecting certain characteristics. Epistemic uncertainties can therefore be reduced by obtaining more data or by refining models. Aleatory uncertainty, however, cannot be reduced as it represents statistical variables that differ each time a given experiment is carried out [46][47][48][49][50][51][52][53][54][55][56][57]. The influence of different types of uncertainty can play a key role in confidence determination for risk and reliability analysis [36].
Further examination can be made by the four '(un)known-(un) known risk quadrants', described in detail by Marshall et al. [58]. These levels of risk identification can be applied to both quantitative and qualitative uncertainty since risk is the impact of uncertainty on the measurand. As their names suggest, 'known knowns' are uncertainty sources that have been taken into account and catered for; 'known unknowns' are understood to exist but their magnitude is not defined; 'unknown knowns' are unidentified sources that may be accounted for through alternate means (possibly by other sources creating information asymmetry [59]); 'unknown unknowns' have not been identified or accounted for and, therefore, pose the greatest risk [4,58,60]. These can also represent forecastable uncertainties not initially apparent and unpredictable 'black swan' events. A categorisation of uncertainties centred on these four quadrants based on the nature and source of uncertainty is illustrated in Fig. 1 [58,60]. An example uncertainty source for each quadrant is linked to possible typesquantitative, qualitative, epistemic and aleatory.
Frameworks to assess uncertainty in engineering systems, as well as the systems themselves, require a degree of flexibility to accommodate complexity while maintaining a degree of robustness to meet core objectives within specified confidence boundaries [4,10,47]. Flexibility in engineering systems design allows for mitigation in the face of unknown-unknowns, allowing the system to "evolve" when presented with unpredictable challenges to the point of being reconfigurable with high degrees of freedom [4,6,18]. Robust systems are highly reliable within their design scope and predictable range of associated uncertainty [61]. The level of complexity in a robust system is controlled by identifying and mitigating factors that pose the greatest uncertainty [3,6]. The flexibility of machine learning algorithms allows uncertainties to be forecast in a variety of complex domains, examined further in Section "Analysis of synthesised data".

Research definition
Defining the research scope is necessary to frame clear, answerable questions that formulate the aim and objectives described above; which inform search terms and inclusion/ exclusion criteria in the succeeding phases [22]. Various formal frameworks have been composed to define the research scope and successive RQs.
The PICOC framework illustrated in Table 1 was adopted for this review [22,24,25]. This was selected against others proposed by Booth et al. [22,[62][63][64] such as SPICE [63] and CIMO [65] as it provides a transparent and duplicable identification of key concepts to be implemented in the SALSA framework.
The scope was adapted as more research was uncovered and the author's understanding of the topic grew. The resulting objectives and corresponding RQs are depicted in Table 2. These objectives were derived as the basis to achieve the outcomes defined in the PICOC framework to establish key approaches to quantify and forecast uncertainty in the maintenance of engineering systems.

Review methodology: appraisal and synthesis
The literature search detailed the formulation of the search string entered in online databases with applied filters (article type and publication year), inclusion of previously cited and recommended papers (hand search), along with publications cited in highly relevant sources [63]. The resulting string and search results are illustrated in Appendix A. The specific search process was not deemed relevant for this journal publication. From the database search, 148 files were downloaded on a basis of accessibility, format, title and date. The hand search sourced 119 papers, while 24 were sourced from citations within sourced papers. This resulted in a pool of documentation to assess in the appraisal phase. Inclusion and exclusion criteria are required to refine the results, as well as a structured data extraction methodology, defined in the following sections. Appraisal of identified literature

Quality assessment
It is necessary to refine the number of publications obtained to appropriately satisfy the RQs and assess the evidence base. To do this, a critical assessment of relevance and quality was conducted. The broad selection process in Fig. 2, adapted from Booth et al. [22], was implemented considering the PICOC framework in Section "Research definition", as well as other review examples and author experience. Specific inclusion and exclusion criteria, based on the PICOC framework, are identified in Table 3 [22,23,66].

Data extraction
A data extraction table was composed in MS Excel (Appendix B) to manage the literature and assess the evidence base, allowing different studies to be appraised in a consistent manner [22]. This included a record of: Publication details: Source folder, filename, publication title, author, year, type (journal, book, etc.), source method (database search, citation search, recommended) and author keywords. Study details: Context, aims/objectives, methodologies/theory adopted, data collection strategies. Results: Author's conclusions, outcome/findings, strengths, limitations.
Publication details were recorded for all sources that passed the screening stage in Fig. 2. Eligibility was established in four main sifting stages: title, abstract, introduction/conclusion and full-text reading. If deemed eligible based on title, a preliminary understanding of study details and results was obtained from the abstract to gain familiarity and identify key information. Publications considered relevant were then looked over in more detail to gain a comprehensive understanding in the next two stages. This allowed papers to be summarised into categories and identify relationships for synthesis [67]. Cited publications within papers that could enhance the research picture were searched for directly and fed back into the start of the process. A total of 185 papers were eliminated in the process, based on the sifting stages illustrated in Figs. 2 and 3.

Synthesis of extracted data
The synthesis phase of the SALSA framework overlaps with the search and analysis phases to produce a breakdown of extracted data, comparing similarities and differences within each category [22]. This phase will identify what the literature says. The analysis identifies what it means.
Data extracted from the papers was categorised through a thematic synthesis. This is a well-validated method for synthesising qualitative data [22][23][24][25]68]. Key themes were established according to the research scope defined in the PICOC framework (Table 1), RQs, discussions with academic supervisors and the author's understanding of the topic. The thematic categorisation Table 1 Research scope definition -PICOC framework.

Concept Definition
Population Uncertainty prediction and assessment; considering the impact attributed by a combination of quantitative and qualitative inputs over the inservice phase of complex or non-complex engineering systems Intervention Examination of existing UQ techniques, qualitative assessments, uncertainty forecasting, multivariate uncertainty aggregation for differing probability distributions Comparison Current industrial practiceshow does the new proposal compare to the existing methods and academic processes? Outcomes Determination of relevant probability distributions and guidance on how to quantify uncertainty in context to aid decision making for industrial maintenance Identification of methodologies to quantify qualitative uncertainty attributes, combine quantitative and qualitative uncertainties and assess significant correlations Identification of methodologies to forecast uncertainty through the in-service life and optimise outputs as new information is acquired Context Multivariate quantification, aggregation and forecasting of technical engineering uncertainty for engineering systems in-service, applicable to industrial maintenance involved the generation of several categories for each established theme. This was achieved through a repetitive word counting process whereby the most frequently used words in the full text of each included paper were cross-referenced with the proposed category names using VLOOKUP functions in Excel (snapshot in Appendix C). The most recurring words were more likely to be identified as categories that could be applied to the themes. This process required several iterations to combine and refine categories within a larger area and eliminate less frequent or irrelevant words; identified by the same method as key themes. The category term most frequently recurring for each paper was highlighted. For each category, the number of highlighted cells over the 107 papers was added to the number of papers that contain that category term. The resulting 'score' was then used to identify the most relevant categories in each theme, combining similar terms. The resulting themes and categories are defined in Table 4. Where applicable, the pros and cons of these categories are discussed in the Analysis phase.
Theme and category definitions were determined through the author's interpretation of occurrences in literature as well as dictionary definitions. An example of the thematic synthesis data extraction for 3 papers is illustrated in Table 5.

Analysis of synthesised data
This section examines the categorised themes from the synthesis and extracted data to answer the research questions (RQs) defined in Section "Research definition". Thematic analyses are presented to examine the coverage of each theme over the included papers and correlations between them, assessing results from the synthesis. Narrative analysis is presented for each theme to discuss results and case examples. The evidence base and correlations from the thematic and narrative analyses are evaluated to answer the RQs in Section "Research results and discussion". Any generated hypotheses were grounded to populate an emergent theory. Conclusions are drawn and compared with other studies in the category [22,72].
The identification of uncertainties that influence the measurand will inherently vary in dynamic nature depending on the context of the measurand; be it a simple system under laboratory conditions or a complex engineering system (CES) with a myriad of interconnected subsystems. The adoption and adaptation of this process in various contextual applications in the included papers is discussed in Section "Contextual application". Section "Uncertainty propagation and simulation techniques" examines RQ1, focusing on the aggregation of uncertainty across multiple elements. Section "Probability distributions for uncertainty analysis" looks at the selection and use of relevant probability distributions to conduct the analysis. RQ2 is examined in Section "Qualitative uncertainty analysis", where methods to conduct qualitative uncertainty analysis are examined. Section "Uncertainty assessment and forecasting" examines RQ3, focusing on forecasting uncertainty for the in-service phase of engineering systems.
Publication details of year and type for the 107 included papers are illustrated in Figs. 4 and 5. The majority of examined papers were published in 2019-1920. A positive linear trend in publications up to the present indicates a growing relevance and interest within the research scope. The term 'Conference' includes workshops; 'Book' includes book sections and booklets.
The majority of examined publications are journal articles, which are identified specifically in Table 6. 'Other' consists of journal publications featured once.

Contextual application
The contextual application theme identified in Table 4 groups publications, as the name suggests, in their applied context. Through the refinement process described in the synthesis, 4 categories were identified: Aerospace & defence (Inc. nuclear weapons and other military applications), Emissions, energy & environment (Inc. oil & gas, meteorology, energy & power, greenhouse gases and coastal models), Manufacturing & maintenance (Inc. optimisation of processes around PSS and in general, structured surfaces, machine tooling and miscellaneous case studies) and Theory (Inc. description and derivation of analytical methods without a specified application). The number and percentage distribution of these applications are illustrated in Fig. 6.
The majority of included papers examine the theory in uncertainty analysis, aggregation and forecasting (41%). These include statistical analysis, qualitative methods such as the pedigree approach and machine learning and Bayesian reasoning for forecasting. Papers applied in the other three contexts are reasonably distributed.

Uncertainty propagation and simulation techniques
This section examines identified techniques to propagate uncertainty. The percentage of the 107 included papers that make use of or adapt the main techniques identified through the synthesis are illustrated in Fig. 7, stacked by their contextual application. The 'other' category encompasses less used methods used in the research context such as Latin Hypercube sampling and Taylor series expansion.   Level of interdependence between 2 or more variables Degrees of freedom Amount of information in a sample relevant to the estimation of a parameter Expertise/assumption Derivation of a parameter through opinion-based, non-statistical means Fuzzy set theory Function assigns a grade between 0 and 1 to each input parameter of a set, as opposed to Boolean which is 0 or 1 Monte Carlo Highly effective and flexible simulation technique to generate random variables about specified input parameters for multiple distribution types Neural network Network of cooperating processing elements to give an output. This is applied to a model and 'trained' to give an optimum output Pedigree matrix Scores results of qualitative expert judgement or assumptions according to predefined criteria to allow for quantitative assessment Sensitivity analysis Identifies key input parameters for uncertainty analysis. Quantifies how changes in input value alter that of the outcome Survey/interview Qualitative data collection method for expert or general population opinion on a given topic Other Methods not used in many papers Probability distributions Type of distribution function (PDF) used to represent uncertainty about a given range in the analysis process Beta (See Table 7) " Uncertainty assessment and forecasting Most prominent terms and qualities used to predict and forecast uncertainty Challenges Hinders, adds complexity or prevents action towards a given entity Deep learning Use of artificial networks to learn from existing data to predict or optimise future results Forecasting Predicting future trends based on past and present data Life cycle A series of stages or developments that take place over the useful lifetime of a given product or service Optimisation Finding the best or most effective use of a situation or resource Over time Measurable progress of past, present, and future events Prediction Estimate that something will happen or will be a consequence of something else -Synonym for forecasting The categorised techniques can apply to purely quantitative (Section "Quantitative uncertainty analysis"), qualitative (Section "Qualitative uncertainty analysis") or multivariate (Section "Multivariate uncertainty analysis") uncertainty quantification and analysis. The distribution of analysis types by contextual application is shown in Fig. 8.
Purely quantitative analysis is considered by 43 papers (40%), purely qualitative is considered by 23 (21%) and a multivariate combination is considered by 41 (38%). One of the core focuses of this research is to equate qualitative approaches for implementation with quantitative approaches to then quantify multivariate technical engineering uncertainties in engineering systems. This consideration is necessary for real-world applications, however not essential when considering costing of such systems in theory. This is further explained in Sections "Qualitative uncertainty analysis" and "Multivariate uncertainty analysis".
Terms such as 'variance', 'standard deviation' and 'stochastic' were not included as they were considered too generic. Some commonly used techniques appear to feature less frequently than one might expect (e.g. degrees of freedom in 9% of the 107 papers). The reason for this is that some studies focus on a specific part of the analysis process (e.g. uncertainty source identification through expert opinion or interviews) and so consider other stages to be out of scope.

Quantitative uncertainty analysis
Purely quantitative uncertainty analysis focuses on epistemic, statistical data. Techniques are discussed in theory below, which are then applied in case examples. Qualitative aspects need to be taken into consideration to be applied to real-world dynamic cases. The most commonly used techniques in the included papers that focus on quantitative analysis are illustrated in Fig. 9, again stacked by contextual application.
Uncertainty is statistically equal to the standard deviation of a given dataset, which is equal to the square root of the distribution variance and referred to as the 'standard uncertainty' [30,73]. Many potentially identifiable uncertainties will have a negligible impact on the measurand. To maintain focus on uncertainties that have a tangible impact on the system, alongside expert judgement, sensitivity analysis is conducted across the input parameters [47,48,61,70,[74][75][76][77][78][79][80][81]. Sensitivity and correlation are key considerations in both single type and multivariate UQ. As seen in Fig. 9, 40% of the 43 quantitative analysis papers reviewed explicitly use sensitivity analysis and 23% discussed correlation between the inputs.
Monte Carlo simulation is by far the most widely used simulation method to evaluate uncertainty; used in 63% of the 43 quantitative papers and 52% of the total 107 included papers. It is stated to provide the most effective approach to the propagation and analysis of uncertainty in many situations for various combinations and complexities [16,20,31,50,[81][82][83][84][85][86]. It can be applied to multiple probability distributions for multivariate analysis and forecasting. Extensive sampling of uncertainty ranges for individual variables can be achieved without the use of substitute models [80]. However, it can require significant computational power, with 1000-10,000 simulation runs generally accepted as appropriate coverage depending on model complexity [83,87]. Taylor series expansion and Latin hypercube sampling are often used as part of Monte Carlo simulation, but are largely covered in theory in the included papers [80,83,88]. These two methods are considered in the 'Other' category in Fig. 9. Bayesian analysis derives the probability of an event occurring given that a prior event has occurred, given as a probabilistic function of the two events occurring independently or together [30,50,89]. Bayesian methods applied in forecasting are covered in further detail in Section "Uncertainty assessment and forecasting".
In 1995, the International Standards Organisation (ISO) published the Guide to the Expression of Uncertainty Measurement (GUM). This is commonly referred to in literature as 'the Guide' or 'GUM' and has seen various updates and expansions since its inception [30,34,43,45,85,90]. The general uncertainty analysis process defined by the GUM involves 5 core stages [30,34,90]: (1)   identify the measurand; (2) identify uncertainty sources and associated probability distributions; (3) quantify uncertainties (simulation); (4) aggregate uncertainties; (5) report analysis results. While proficient for purely quantitative estimates, the GUM employs coverage factors and confidence limits to accommodate for qualitative or multivariate estimates. These often lead to underestimation, do not permit flexibility and, therefore, cannot be realistically applied in dynamic, complex engineering systems [61,91]. Since its inception, the GUM has been applied and adapted to assess uncertainty in a range of applications from structured surfaces [92] to micro gear measurement [93], smart grid power systems [77] and risk and reliability assessment in the nuclear weapons sector [56]. Uncertainty typically increases when considering correlation between input parameters. This, along with sensitivity between inputs to identify those with the greatest impact, are key considerations for rigorous uncertainty analysis to capitalise on risk with the best possible model representation.   Complex system uncertainty analysis involves representations of epistemic and aleatory uncertainty. For epistemic analysis, uncertainty can be represented through various means including interval analysis, possibility theory, evidence theory and probability theory [2,4,56,94]. Probability theory is the dominant method, but others can be useful in the CES contextexamined and compared in Sections "Qualitative uncertainty analysis" and "Uncertainty assessment and forecasting". The main challenges for UQ in these contexts include the aggregation of information from multiple sources and the propagation of complex computational models that incorporate flexibility in design while holding a degree of robustness to deliver on objectives [2,6,10,56].

Qualitative uncertainty analysis
The identification of known qualitative uncertainty sources typically relies on expert opinion. Methods used to aid this process include surveys, interviews and the pedigree matrix [15,16,41]. Qualitative frameworks are often used in conjunction with quantitative methods such as Monte Carlo and sensitivity analysis in the context of real-world applications. Therefore, the majority of qualitative applied cases are discussed in the next section, including those considering surveys and interviews. Fig. 10 shows the distribution of techniques used in purely qualitative analyses. Expert opinion and Monte Carlo were implemented in 52% and 26% of the 23 qualitative papers respectively. This section will examine commonly used qualitative propagation approaches, namely the pedigree matrix, as well as comparisons between probability theory, evidence theory and fuzzy set theory.
The pedigree matrix was derived by Funtowicz and Ravetz [95] to score qualitative (expert) knowledge and opinion according to predefined criteria to permit quantitative reliability assessment. It has been used in 17% of the 23 papers considering purely qualitative analysis (Fig. 10), solely applied in the emissions, energy & environment context, and 22% of the 41 considering multivariate (Fig. 11), and applied in all 4 considered contexts, though largely again in emissions, energy & environment. It has also been applied in medical fields and genealogy, largely visualised using decision trees, though these are not examined in the scope of this review. Pedigree assessment relies on expert opinion. Pedigree criteria are defined according to the contextual application of the study [15,16,96]. Qualitative assumptions made in uncertainty analysis can have a significant impact on the resulting estimate, especially in the context of complex systems. The application of the pedigree matrix in complex environmental problems can highlight bias, implausibility, disagreement among stakeholders, limitations and sensitivities [76].
The pedigree approach can be applied on its own or through a notational scheme devised by Funtowicz and Ravetz [95] to standardise multivariate uncertainty dimensions via 5 qualifiers: Numeral, Unit, Spread, Assessment and Pedigree (NUSAP). The first 3 terms consider quantitative factors: the quantity value, acquisition date and random error of the variance of the dataset (addressed by sensitivity analysis and Monte Carlo), respectively. Implementation of NUSAP is further discussed in Section "Multivariate uncertainty analysis".
Additional uncertainty propagation approaches include probability theory, evidence theory and fuzzy set theory [31,89]. Probability theory is the 'classic' UQ method for input parameters with definable probability distributions, discussed in much of this review. Evidence theory makes use of artificial intelligence and machine learning to collate evidence from different sources and presents an evaluation to understand if the available evidence is common or contradictory [19,31,97]. Evidence theory can neglect deterministic decision-making, which considers the outcome alone without associated risk, by keeping an 'open eye' to new information, governed by a belief system to dictate possibility measures [19]. This may be a suitable approach for qualitative reasoning but is less suited to estimating quantitative uncertainty, which is centred on recorded data [31].
Fuzzy set theory is applied in machine learning to assign a grading to input parameters (e.g. a scale of 0 to 1 rather than 0 or 1). This is well suited in cases where recorded data and knowledge is lacking and available data is inherently subjective [15,19,31,49,89,98]. This lack of mediated data is one of the major challenges in UQ for both complex and non-complex engineering systems [1,2,14,15,41,71,98]. Uncertainty analysis where data is scarce benefits greatly from the application of artificial neural networks (NNs). These networks of cooperating input elements are applied to a model and trained to give an optimum output by learning from previous examples [19,99]. NNs are a go-to option for forecasting and predicting tasks to be undertakendiscussed further in Section "Uncertainty assessment and forecasting".

Multivariate uncertainty analysis
The term 'multivariate' is defined here as the combination of uncertainty from quantitative, measured, recorded data and qualitative, experience-driven opinion or human factors. Since qualitative estimates are obtained from technical expert knowledge or experience, they were not initially classed as pure statistical quantities with definable degrees of freedom [30,44]. The GUM proposed coverage factors and confidence limits as methods to accommodate for qualitative or multivariate estimates. An 'effective' degrees of freedom is applied using the Welch-Satterwhite formula [30,100], though this was later found to lead to underestimation of the combined uncertainty [43,45,91,101,102]. Since then, a range of advanced qualitative, quantitative and multivariate methods have been proposed to gauge qualitative estimates in a way that can be statistically equal to quantitative estimates. The percentage of included papers that used multivariate analysis is shown in Fig. 11. Expert opinion and assumptions made to carry out the assessment were considered in 39% of the 41 multivariate analysis papers (discussed in the previous section). The quality of the opinion sways the confidence in the result (considered in 49% of multivariate analysis papers), which can be determined through the pedigree matrix (in 22%) and sensitivity analysis (in 32%).
The pedigree matrix can be applied to simple calculations and complex models through explicit and systematic reflections on multivariate uncertainty [15,16,96,103]. Uncertainty estimation in life cycle costing under product-service systems (PSS) is a growing field of interest, where uncertainty changes throughout the life cycle stages [7,9,14,[19][20][21]94]. Uncertainty analysis in PSS is examined further in Section "Uncertainty assessment and forecasting". NUSAP has been implemented to estimate uncertainty in cost estimation from different sources at the bidding stage of industrial PSS contracts in the aerospace & defence context [104]. Uncertainties were identified through a predefined classification; commercial, affordability, performance, training, operation, engineering (CAPTOE) [105] and ranked using NUSAP [16].
The incorporation of qualitative estimates with quantitative assessments in the in-service phase of industrial PSS may present challenges due to increasing complexity but can also draw parallels from other phases of the life cycle [7,10]. Additional reviews have analysed value capture for PSS throughout the product life cycle on the transition to servitisation [9], availability support [106] and information flow [21]. Lack of concrete data and qualitative decisions cause uncertainty that can lead to undesirable results. This also prompts the need for flexibility in PSS under uncertainty [6].
Data quality in life cycle assessments (LCA) is enhanced through a multivariate consideration of parameters. The use of pedigree and sensitivity analysis allows uncertainty parameters with negligible impact to be eliminated, enabling focus on those that influence the measurand [20,70]. This helps to alleviate the tradeoff between accuracy and implementation costs in LCA to identify the most significant input parameters.
Another application domain of multivariate uncertainty in engineering systems is real-time systems. Largely considered in software engineering, these systems are highly dependent on confident and thorough uncertainty estimates to account for worst-case scenarios [107,108]. Uncertainties considered can range from computational processing times [107,109] to environmental and human factors, such as in virtual reality (VR) applications with remote maintenance [108]. Literature concerning real-time systems in this review is considered under the manufacturing and maintenance context. Real-time systems are inherently complex owing to the range of assumptions taken into account and unpredictable behaviour and interaction of system elements. To obtain confident predictions of worst-case execution times, evolutionary algorithms are employed along with surrogate models, neural networks and regression modelsfurther explored in Section "Uncertainty assessment and forecasting" [109,110].
Further applications of the pedigree matrix and sensitivity analysis, along with Monte Carlo and Taylor Series expansion, are made in the oil & gas sector to estimate uncertainty in greenhouse gas emissions [83]. These highly complex operations consist of multivariate estimates requiring rigorous estimates. Confidence levels associated with individual sources are dependent on data availability and quality. This process followed the core methods described in the GUM [30,34,83,90]. While applied solely to the oil & gas sector in the examined literature [83], the analysis method should be applicable in broader areas within the research scope.
A further approach to gauge qualitative uncertainty factors through the pedigree approach in a way that they can be attributed to quantitative estimates is as a geometric standard deviation (GSD), which fits to lognormal distributions [85]. The measure of GSD is necessary to overcome scaling in data. As discussed in Section "Quantitative uncertainty analysis", the standard deviation is representative of the uncertainty in a given dataset, which relies on the scale (unit) of linear data [85]. Therefore, for the analysis of data from varying sources and measured in different units, uncertainty factors need to be independent of scaling effects.
Using GSD as the uncertainty measure overcomes scale dependency. If the data source does not follow a lognormal distribution, GSD ratios are obtained via the coefficient of variation (CV) [84,111]. This considered quantitative uncertainty sources attributed by epistemic error and qualitative sources due to imperfect data. The CV is a dimensionless measure of variability defined as the ratio between the standard deviation and the mean [84,85,111,112]. Muller et al. [84] devised formulas to apply the CV to various distributions to allow selection of the most appropriate probability distribution functions (PDF) for the analysis. The robustness of this method was tested for each parameter PDF using Monte Carlo simulation. This is a key method to combine multivariate uncertainties through different PDFs.

Probability distributions for uncertainty analysis
The selection of the most appropriate PDF depends on the nature of each input parameter (quantitative or qualitative sources) and how it is recorded [15,113]. The most common types of PDF used in the included papers are stacked by their contextual application in Fig. 12.
Statistical measured data is typically represented by the normal (Gaussian) distribution, used in 58% of the 107 examined papers, or lognormal in 9%. Uniform distributions are considered in 33% of papers. When recording data, an individual digital readout has a uniformly distributed uncertainty, since it is on or off. The values of the readout are represented by a different distribution, depending on how it was recorded. Several publications therefore considered more than one type of distribution. Table 7 describes the main distributions identified in the papers, adapted from Stockton and Wang [113], Everitt and Skrondal [33], and Erkoyuncu [15].
The Weibull distribution is used in reliability modelling and analysis for life cycle forecasting [11,14,114]. This could be an important distribution choice when considering forecasting uncertainty, however, it was only considered in 6% of papers included in this review.

Uncertainty assessment and forecasting
This section of the analysis focuses on how uncertainty can be forecast and modelled over the in-service phase of an assets life cycle and where these are or can be applied to complex and noncomplex engineering systems. The term 'assessment' is a judgement of value or quality based on available information, while 'forecasting' is the determination of most likely future outcomes based on that information. The majority of studies identified centre around cost estimation in the PSS context [6,7,9,14,19,20,41,71,82,89]. The in-service phase of PSS covers the largest portion of the life cycle situated between contract bidding and disposal. This phase calls for numerous equipment considerations including reliability, flexibility, availability and maintainability to ensure the asset is fit for purpose [19,104]. Each of these considerations raise challenges which promote numerous uncertainties, covered in Section "Multivariate uncertainty analysis". Schwabe et al. [82] stated that the ability to quantify and forecast cost uncertainty is often limited by minimal measurement points, lack of experience, unknown history and low data quality. This precipitates innovation hesitancy in the face of an everincreasing rise in technological innovation [115].
One of the key emerging techniques to forecast uncertainty is deep learning; a subset of machine learning. The main difference between them is the way data is presented. Machine learning algorithms tend to require structured data, whereas deep learning networks rely on layers of a neural network (NN). The terms are often used interchangeably. The quality of data ultimately determines the quality of the result. To give greater confidence in estimates such as maintenance costing, backpropagation algorithms can be applied to further improve the quality of NN training [11,18,110,[116][117][118][119]. Applications were reviewed in terms of their learning capability and reliability in uncertainty prediction. Stochastic models calculated from steady-state probabilities do not necessarily reflect reality since maintenance policies can take several years to stabilise [11].
A review of the theory of probabilistic modelling in machine learning and artificial intelligence was made by Ghahramani [17]. Data is the key element of machine learning systems, the capability of which is largely dependent on the probabilistic interpretation of uncertainty. Bayesian learning is the application of Bayesian probability theory in machine learning, where predictions or beliefs are updated when presented with new data via Bayes' Theorem [52,117,[120][121][122][123][124]. Smart [111] presented a methodology to develop cost estimating relationships from trends with limited data. Bayes' Theorem was utilised to combine prior data, experience or opinion with limited real-time data to produce accurate forecasts with a degree of confidence.
The main challenges in Bayesian learning are the quality and availability of prior data or knowledge and the flexibility of models to encompass all properties of data required to achieve the prediction task. Flexible models can make better predictions, but all predictions involve assumptions [17,120]. Gaussian processes are a highly flexible non-parametric approach to predict unknown functions and are widely used for regression and classification. Non-parametric model predictions get more complex with the density of training data, whereas parametric models have a fixed number of parameters. Gaussian processes are used to optimise the training process for machine learning models [120,125,126].
The end of Section "Qualitative uncertainty analysis" identified the endorsement of NNs to aid uncertainty analysis for complex engineering systems alongside fuzzy set theory. The terms and qualities identified in the synthesis to represent uncertainty assessment and forecasting are illustrated in Fig. 13 and stacked by contextual application. Uses of NNs and Bayesian techniques from Fig. 7 and selected distributions from Fig. 12 are included for comparison. Life cycles of products or services were considered in 39% of the 107 included papers, with 21% considering NNs and 32% considering Bayesian techniques. The General Likelihood Uncertainty Estimation (GLUE) method uses Bayesian inference to assess uncertainty in model predictions. Largely applied in hydrology and meteorology, the method uses ensemble forecasting of weighted parameter sets to identify the contribution level of each set for a forecasted point in time [69,[127][128][129]. Simmons et al. [69] used the GLUE method as a tool for optimisation and estimation of best-fit parameters for numerical models in coastal engineering. The use of Monte Carlo simulation provided confidence intervals on predicted erosion levels, which can offer information to the decision-maker on expected uncertainty [115]. The GLUE method was seen to be an effective calibration technique for coastal erosion modelling. Simmons stated that the GLUE method provides an improvement in predictive skill, a rigorous evaluation of model sensitivity to parameters, an ability to identify differences in model performance and quantification of parameter-induced uncertainty.
A deep uncertainty quantification (DUQ) prediction model was proposed by Wang et al. [116] to learn from historic data through a negative log-likelihood error (NLE) calculation to forecast weather patterns. Optimised by backpropagation, the data-driven network incorporated uncertainty directly into the loss function to reduce errors. Relationships between variables were predicted by regression algorithms. The combination of deep learning and UQ was shown to improve generalisation of point estimation compared to RMSE calculation to forecast multi-step meteorological time series but is best suited to scenario modelling in meteorology.
Bayesian deep learning (BDL) is one of the most popular techniques to learn from and forecast data trends [17,52,120,121,125,130]. However, this approach requires significant modification models, adopting variation inference instead of backpropagation, making them harder to implement, computationally slower, and even reduce test accuracy [116,125]. A theoretical framework proposed by Gal and Ghahramani [125] used a dropout training approach to approximate Bayesian inference in Gaussian processes in deep neural networks, which was shown to mitigate the problem presented by BDL. The uncertainty assessed here was in the deep learning process itself, not the resulting uncertainty interval.
One of the most significant challenges highlighted in this review in terms of uncertainty quantification and forecasting in all examined contexts is data availability. Where there is not sufficient data to fulfil the Central Limit Theorem, where the normalised sum of inputs tends towards the Normal distribution, estimates cannot necessarily be made with enough confidence to make rigorous estimates or forecasts [14,30,82,109,110,114]. To this end, Schwabe Table 7 Analysis: comparison of commonly used PDFs [15,33,113].  et al. [82,99] devised a novel approach to forecast cost uncertainty for a given point in time by spatial geometry, described through the symmetrical relationship between cost variance data at a given point in time, represented in a vector space. The life cycle under consideration was represented as an open complex system. These vectors were aggregated to give a probable cost variance represented in state space. There is limited literature on holistic, multivariate cost uncertainty estimation for the in-service phase of PSS [19,71]. Guidance is scarce to aid the selection of suitable uncertainty modelling methods such as NN, BDL and fuzzy set theory, which in themselves generally only consider epistemic forms of uncertainty [17,19,31,89,99].

Research results and discussion
The final phase of the review methodology discusses the research methodology and results conducted through the SALSA framework [22]. An evaluation of the validity of research methods adopted and findings culminated throughout the review is given in Appendix D. Research questions 1 and 2 share many similarities and are discussed in Section "Discussion of findings for research questions 1 and 2", summarised in Table 8. Research question 3 is discussed in Section "Discussion of findings for research question 3", summarised in Table 9. Section "Research questions contribution to knowledge" summarises the core contributions to knowledge of from findings of the research questions.

Discussion of findings for research questions 1 and 2
How can multivariate uncertainties be aggregated and represented through different probability distributions?
The analysis of papers to answer this question is presented in Sections "Uncertainty propagation and simulation techniques" and "Probability distributions for uncertainty analysis". Quantitative uncertainty analysis considers an aggregation of input parameter uncertainty whose value is derived from statistical data. Sensitivity analysis and Monte Carlo simulation are used to propagate uncertainty ranges over multiple PDFs along with correlation between inputs and respective degrees of freedom. The majority of solely quantitative approaches follow the standard GUM method, or an adaption thereof.
The main qualitative analysis techniques combined the pedigree matrix, largely integrated in NUSAP, with quantitative assessment methods such as quantitative risk assessments and LCA. The former appreciated the need for multivariate considerations but there were no examples found of a combined approach. The latter applied sensitivity analysis to eliminate negligible inputs to alleviate the trade-off between measurement accuracy and implementation costs. However, uncertainty over the life cycle was considered constant, when in reality it is likely to fluctuate. The multivariate combination of quantitative and qualitative uncertainty is essential in real-world contextual applications to provide estimates of cost, availability and reliability with high levels of confidence.
The selection of the most appropriate PDF to represent a given uncertainty source is crucial in the analysis process [15,113]. Attributing qualitative factors as geometric standard deviation (GSD) enables the quantification and aggregation of multivariate uncertainties through an amalgamation of the pedigree matrix, Monte Carlo simulation and coefficient of variation (CV) [84,85]. The presented method can be applied to a range of PDFs.
Findings were qualified by referred sources and standardised methods for quantitative and qualitative uncertainty analysis. The probity of the amalgamation of these methods is considered unbiased since it can, in theory, be applied to multiple PDFs in multiple contexts. It also fulfils the outcome of the PICOC  [30,34,43,45,85,90] Standardised methods for quantitative aggregation (standard deviation) Gives standard 5-step process to identify, quantify and combine uncertainties Uses effective degrees of freedom for qualitative aggregation, leads to underestimation Widely used with small variations in multiple applications Use of effective degrees of freedom via Welch-Satterwhite formula can lead to underestimation of combined uncertaintyimproved method presented by Willink [91] NUSAP [16,96] Can be applied to simple calculations and complex models Found to improve the depiction of uncertainty through visualisation and background knowledge compared to EQRAs Not clear how quantitative and qualitative estimates were combined explicitly Uses pedigree to attribute qualitative estimates in a quantitative manner, suited to a broad range of applications Geometric standard deviation (GSD) and coefficient of variation (CV) [84,85,111,112] Estimates are represented under the lognormal distribution as GSD to eliminate scaling effects from different types of data CV enables aggregation of quantitative and qualitative uncertainties represented by different PDFs Uses pedigree to attribute qualitative estimates via GSD Willink method [45,91] Fits quantitative estimates to qualitative by attributing a known parent distribution to quantitative "Proposed method improves performance when some error components are drawn from non-normal distributions whose variances are obtained by non-statistical means" Qualitative estimates represented by known variance and 'coefficient of excess' Removes bias of overall variance estimate Top-down approach AKA: Nordtest approach, Single-lab validation [102] Broad leveldoes not go far into measurement procedure and does not attempt to quantify all uncertainty sources individually, contrary to GUM, but follows the same 5-step process Instead, uncertainty sources are quantified in large "batches" via components that take several uncertainty sources into account Uncertainty obtained characterises analysis procedure rather than an explicit result Considers uncertainty component by possible biasdetermined against an uncertain reference value framework to determine relevant probability distributions and methods to quantify uncertainty that can be applied in industrial maintenance. Other approaches examined were only applied in theory, prompting the need for further research in applied fields.
Alternative techniques may exist that were not covered in this review. This can be down to the probity of the initial search string and robustness of the elimination process.
How can qualitative estimates driven by expert opinion and individual experiences be standardised and validated?
The analysis of papers to answer this question is presented in Section "Qualitative uncertainty analysis". Qualitative approaches applied in real-world cases are used in conjunction with quantitative methods such as Monte Carlo and sensitivity analysis. The pedigree matrix is one of the most widely used methods to validate qualitative attributes such as expert opinion and experience [15,16,84,85,95,96]. This requires the definition of pedigree criteria upon which the experience or qualifications of an 'expert' are scored and aggregated to attribute a quantitative measure of uncertainty. These criteria can be defined through surveys and interviews with industrial practitioners and academics. This approach has been adapted and implemented in a range of fields for various purposes [16,31,70,76,83,85,96]. Expert opinion and individual experiences can be validated against defined pedigree criteria to provide a standardised representation of uncertainty attributed by them.
The highly adaptive nature of the pedigree approach through NUSAP has allowed it to be implemented in multiple real-world contexts, while others such as applying effective degrees of freedom are more likely to lead to lower confidence in the overall uncertainty estimate. The definition of criteria alleviates bias in the approach, though this should be made by a diverse selection of suitably qualified individuals. The pedigree approach was the only qualitative technique explored in detail as it was deemed best suited and widely accepted to fulfil the desired application. Other approaches or adaptations of pedigree may warrant further investigation, but the application through GSD and CV proposed by Ciroth et al. [85] and Muller et al. [84] appear best suited to fulfil research questions 1 and 2. These factors also achieve the outcome portion of the PICOC framework in Table 1 to identify methodologies to quantify qualitative uncertainty attributes and combine them with quantitative uncertainties.

Discussion of findings for research question 3
How can uncertainty be forecast over the in-service phase of an asset's life cycle and what are the key challenges faced in doing so?
The analysis of papers to answer this question is presented in Section "Uncertainty assessment and forecasting". Intelligent learning techniques are increasingly used to flexibly forecast uncertainty estimates in a range of fields, though applied methods for maintenance in-service are limited. The key challenges faced here, as in traditional uncertainty analysis, are the quality of available data and experience and knowledge surrounding data collection [56,131]. These challenges limit the ability to optimally train networks through probabilistic Bayesian learning, which reduces confidence and robustness in the associated uncertainty estimate. Alternative approaches to forecast uncertainty under limited data have been proposed such as deep uncertainty quantification (DUQ) [116], drop out learning [125] and spatial geometry [82].
Findings for this research question may be considered bias towards the context of cost estimation in PSS [6,7,9,21,106].  [15,19,31,49,89,98] Function assigns a grade between 0 and 1 to each input parameter of a set, as opposed to Boolean that are 0 or 1 Suitable for qualitative reasoning, not for estimating quantitative uncertainty. Often recommended in cases where recorded data and knowledge is lacking and available data is inherently subjective. Used alongside NNs to aid uncertainty analysis Neural network (NN) with backpropagation (BPN) [15,17,56,[109][110][111]113,125,132] A flexible network of cooperating processing elements to give an output. Applied to a model and 'trained' to give an optimum output Backpropagation computes the gradient of the loss function and uses it to change input parameters to reduce mistakes and optimise the output Other applications reviewed regarding learning capability and reliability in uncertainty prediction, giving greater confidence in maintenance cost estimates BPN addresses stabilisation of maintenance policies based on steady-state probabilities from stochastic models at inception that may not reflect reality for forecasts. GLUE method [69,[127][128][129] Uses Bayesian inference and ensemble forecasting to assess uncertainty and contribution (sensitivity) of factors for a forecasted point in time Monte Carlo simulation provides information to the decision-maker on expected uncertainty with a degree of confidence. Allows for identification of differences in model performance and quantification of parameter-induced uncertainty Deep uncertainty quantification (DUQ) [116] Combines deep learning and UQ to forecast multi-step meteorological time series Uncertainty is incorporated straight into a loss function and is directly optimised through backpropagation Improves generalisation compared to mean squared error (MSE) and mean absolute error (MAE) BPN incorporates uncertainty directly into loss function for direct optimisation Regression is solved as a mapping problem rather than curve fitting and so cannot be naturally applied to multi-step timer-series forecasting Dropout as Bayesian approximation [125] Theoretical framework casting dropout training in deep NNs as approximate Bayesian inference in deep Gaussian processes Bayesian models require significant modification to train deep models, making them harder to implement and computationally slower Dropout training used to approximate Bayesian inference in Gaussian processes Approximate Bayesian inference updates probability as more evidence becomes available Considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods such as BDL Spatial geometry [82,99] Forecasts cost uncertainty for a given point in time where available data is scarce, determined by the geometric symmetry of cost variance data at the time of estimation Represents uncertainty in a vector space, aggregated to give probable cost variance in state space. Propagation described through the symmetrical relationship between cost variance data at a given point in time set apart from 0. Alternative to traditional parametric techniques where available data is not sufficient to fulfil the Central Limit Theorem Additional research is needed to examine how the assessed deep learning approaches can be applied for uncertainty assessment in industrial maintenance under limited data [98,108]. This requires a multivariate aggregation at present and a prediction of how the uncertainty may change through the in-service phase, considered for individual system components and as a whole. This achieves the final outcome of the PICOC framework to identify methods to forecast uncertainty and optimise outputs as new information is acquired. The resulting forecasts can be utilised by decisionmakers to determine where uncertainties may pose an undesirable risk over time, requiring mitigation to reduce the likelihood of unforeseen costs and delays.

Research questions contribution to knowledge
The analysis of synthesised literature to answer research question 1 (RQ1) in Section "Uncertainty propagation and simulation techniques" summarised the key UQ approaches used to undertake purely quantitative, purely qualitative and multivariate analysis. Section "Probability distributions for uncertainty analysis" identified PDFs best suited for uncertainty analysis applicable to industrial maintenance. Standardisation of qualitative factors to answer RQ2 in Sections "Qualitative uncertainty analysis" and "Multivariate uncertainty analysis" highlighted the use of the pedigree matrix to assign scores corresponding to uncertainty intervals [15,16]. These are attributed by their geometric standard deviation (GSD) to combine with quantitative estimates. To gauge these on an equivalent scale for aggregation, the respective coefficient of variation (CV) of each input is used as the uncertainty measure [84,85]. Systems in the reviewed context of emissions, energy & environment are inherently complex. Methods used must be flexible and therefore likely to be transferable to industrial maintenance.
The analysis to answer RQ3 in Section "Uncertainty assessment and forecasting" highlighted the use of deep learning to forecast uncertainty. Methods to forecast individual and aggregated uncertainty manifested by data availability, quality, experience and knowledge over time should be applicable under limited data where traditional probabilistic Bayesian learning cannot be applied. Approaches such as deep uncertainty quantification (DUQ) [116], drop out learning [125] and spatial geometry [82] should be explored to make confident predictions of which uncertainties will pose undesirable risk throughout the in-service phase. Simulations of dynamic uncertainties in complex and non-complex systems can be made through surrogate models to estimate real-time and forecasted behaviour [108,110].

Conclusions and future work
The purpose of this review was to investigate distinct methodologies used to quantify, aggregate and forecast uncertainty for real-world applications. Knowledge gaps within the research scope were highlighted, prompting the future research direction for dynamic uncertainties manifested in engineering systems to optimise performance and availability for the in-service phase.
Section "Introduction" hypothesised that current approaches considering a multivariate combination of factors will increase confidence and rigour in determining the impact of uncertainty over time under limited available data. The methodologies identified above for multivariate aggregation in theoretical and real-world applications, along with deep learning techniques to forecast uncertainty have been shown to achieve this and consequently prove the hypothesis to be true.
Conclusions drawn from the discussion of approaches prove that the aggregation and forecasting of uncertainty are hindered by the quality of available data, experience and knowledge. Modern engineering systems feature a myriad of subsystems interacting simultaneously and nonlinearly with each other with levels of importance dependent on operational condition and system environment. Limited data concerning the optimisation of such systems and interactions between them increases uncertainty throughout their in-service life. These systems typically operate under product-service system (PSS) contracts with multiple stakeholders, which presents challenges to confidently and accurately determine the level of uncertainty at present or in the future. These combined elements inhibit proficient decisionmaking and may lead to under or over estimation.
Multivariate techniques applied in the contextual domains of this review have highlighted equal challenges with limited tangible data and information. Uncertainty attributed to stakeholder relationships is a key challenge to deliver maintenance. However, this is largely tied into supply chains and therefore out of scope for this research [6,21]. Frameworks should be angled towards the core challenges that lead to uncertainty in the maintenance of engineering systems. The in-service life typically spans several years, prompting a need to accurately forecast and track changes in technical engineering uncertainties relating to cost and equipment availability.
From the findings of this review in answering the three research questions, two core research gaps were identified: 1. Lack of frameworks to aggregate multivariate uncertainty that can be applied in-service for increasingly complex engineering systems. 2. Limited approaches forecasting individual and aggregated uncertainties in engineering systems with complex and noncomplex entities under limited data.
Future work to close the first gap is recommended to develop robust frameworks that consider dependencies between multivariate inputs within increasingly complex system boundaries and identify which inputs have the greatest influence on the aggregated uncertainty. Flexibility in engineering systems design allows unpredictable unknown-unknowns to be mitigated (Fig. 1), which should be reflected in uncertainty frameworks. While many UQ approaches exist for purely quantitative scenarios, standardised methodologies to quantify multivariate uncertainty are limited in the manufacturing and maintenance context, especially for the in-service phase. The suitability of the pedigree matrix to determine qualitative uncertainty in the context of the research questions proves promising for the research direction in increasingly complex engineering systems.
The combination of traditional probability theory with deep learning will address the second gap and allow uncertainty to be forecast for real-world applications, incorporating complex and non-complex entities. Deep learning techniques will help to dynamically optimise flexible uncertainty forecasts when new data becomes available. The push to develop deep learning methods to forecast uncertainty is gathering importance as data volumes, computational capability and complexity in engineering systems increases.
Maintenance processes should be simulated through surrogate models, incorporating the identified challenges to execute frameworks to quantify, aggregate and forecast the resulting uncertainties. Data collected from simulations can then be incorporated to train developed frameworks to confidently aggregate and forecast multivariate uncertainty.

Conflict of interest
None declared.

Acknowledgements
This project is based on the collaboration between the Throughlife Engineering Services Centre (TES) at Cranfield University (UK) and BAE Systems. The authors would like to thank the Engineering and Physical Sciences Research Council (EPSRC), project ref.
1944319, and Doctoral Training Partnership (DTP) for funding this research. The data that supports the findings of this study are available upon request from the corresponding author.

Appendix A. Search string and results
Search string: ("Uncertainty quantification" AND ("aggregation" OR "industrial maintenance" OR "forecasting" OR "challenges" OR "complex engineering systems"))  Appendix C. Snapshot of word frequency count matrix for the thematic synthesis using Excel VLOOKUP functions Appendix D. Research methods validity and neutrality The validity of research methods is distinguished here as the extent to which they achieve the objectives. Neutrality is the measure to avoid bias and increase transparency and replicability of the research. The following points examine these traits for the frameworks and methods adopted in this review.
Systematic review procedure: The SALSA framework was adopted to carry out the review procedure due to its contextual flexibility and validity, as well as its successful implementation in other systematic reviews [22][23][24][25]. Scoping framework: The PICOC framework (Table 1) was adopted to scope the research and define the aim, objectives and research questions as it provides a transparent and duplicable identification of key concepts to be implemented in the SALSA framework. Literature search: The PICOC framework was used to construct, refine and enhance the search string (Appendix A. Search string and results). Literature deemed to encapsulate the scope of the research criteria was selected to assess in the appraisal phase. Appraisal: Inclusion and exclusion criteria were defined through the research scope and PICOC framework, as well as examples in literature [22,23,66]. Publications were eliminated on a basis of format (accessibility), duplication, title, abstract, date, introduction/conclusion and full-text reading (Fig. 3) according to these criteria via author's interpretation of their relevance. The remaining papers were deemed most relevant to answer the research questions. Data management was upheld using the data extraction table described in Section "Data extraction". Synthesis: Themes and categories were established through the repetitive word counting process described in Section "Synthesis of extracted data". This reproducible process was validated and refined by comparison with other reviews and academic feedback [24,25,67,68] Analysis: A combination of thematic, narrative, tabular and graphical approaches were adopted to examine the literature and answer the research questions. Types of uncertainty were discussed in Section "Topology of engineering systems and uncertainty" to provide context for the research scope.