A sensitivity analysis of the PAWN sensitivity index

The PAWN index is gaining traction among the modelling community as a moment-independent method to conduct global sensitivity analysis. However, it has been used so far without knowing how robust it is to its main design parameters, which need to be defined ab initio by the analyst: the size ($N$) and sampling ($\varepsilon$) of the unconditional model output, the number of conditioning intervals ($n$) or the summary statistic ($\theta$). Here we fill this gap by running a sensitivity analysis of a PAWN-based sensitivity analysis. We show that PAWN is highly sensible to the setting of ($N,n,\varepsilon, \theta$), and that such uncertainty creates non-negligible chances of PAWN producing non-robust results in a factor prioritization or factor screening contexts. Increasing the precision of PAWN is a complex affair due to the existence of important interactions between ($N,n,\varepsilon, \theta$), which we found significant up to the third-order. Even in an ideal setting in which the optimum choice for ($N,n,\varepsilon, \theta$) is known in advance, PAWN might not allow to distinguish a truly influential, non-additive model input from a truly non-influential model input.


Introduction
Pianosi and Wagener [1,2] have recently proposed a new measure for sensitivity analysis, the PAWN index. Like other moment-independent approaches (i.e. entropy-based [3], density-based [4,5]), PAWN does not resort to statistical second-order moments such as variance to apportion output uncertainty to the model parameters. Instead, it relies on Cumulative Distribution Functions (CDFs) to characterize the maximum distance between the unconditional output distribution Y U , i.e. obtained by moving all parameters simultaneously, and the conditional output distribution Y Cij , i.e. obtained by fixing the i-th parameter to j = 1, 2, ..., n values or intervals within its uncertainty range. The difference between Y U and Y Cij is assessed via the Kolmogorov-Smirnov test, although other distancebased tests, such as the Anderson-Darling's, may also be used [6]. The final PAWN index for a given parameter (T i ) is obtained by calculating the mean, the median, the maximum or any other summary statistic over all the KS values computed between Y U and Y Cij .
We selected the distributions of (N, n, ε, θ) based on previous work on the PAWN index [1,2] and some preliminary tests. Firstly, we observed that a sample size of N = 2000 was sufficient for the PAWN index of almost every model input in all functions to converge (Fig. S1). We followed Sarrazin et al. [15] and considered that a given PAWN index has converged when and T lb i are the upper and lower bounds of the PAWN index for the i-th parameter after bootstrapping 1000 replicas. By defining N ∼ U(50, 2000) we set our study in a scenario where the uncertainty as regards to the required sample size needed to obtain robust PAWN indices is moderate.
We defined the distribution of n based on Pianosi and Wagener [2], who suggested to start with n = 10 and vary n some units up and down to check its effect (provided that n > 5). Regarding ε, we set 10 3 different starting points (seeds) for the pseudo-random number sequence used to generate the indices (from 1 to N ) to sample Y U . This ensured 1) negligible chances of the same seed overlapping with the same value for N , thus introducing determinism into a process that should be mainly stochastic, and 2) that the randomness in the sampling of Y U is assessed in terms of its relative influence in the computation of PAWN. Finally, for θ we used the mean, the median and the maximum as a summary statistic for the KS values when θ = 1, θ = 2 and θ = 3 respectively.
In order to estimate the uncertainty propagated by (N, n, ε, θ) to PAWN indices, we created a (8000, 2k) sample matrix for each function using Sobol' quasirandom number sequences, where k = 4, and transformed the columns into their appropriate distributions ( Table 1). The first k matrix was labelled A and the second k matrix, B. Our model ran row-wise in both the A and B matrices, as follows: based on the information contained in the v-th row, it created a Sobol' matrix of size N (v) , and computed either the Liu et al. [3], the Ishigami and Homma [7], the Sobol' G [12] or the Morris [13] function. Then, for each model in- We estimated how sensible PAWN indices are to uncertainty in (N, n, ε, θ) by means of Sobol' indices [16]. For a model of the form Y = f (X 1 , X 2 , ..., X k ), where Y is a scalar and X 1 , X 2 , ..., X k are independent parameters described by known probability distributions, we can measure how sensible Y is to a given parameter where E X∼i (Y |X i ) is the expected value of Y calculated over all possible values of all parameters except the i-th, which is kept fixed. By dividing Equation 5 by the unconditional model output variance, we obtain the first order sensitivity index for X i , which describes the proportion of variance in the model output caused by X i : We can then decompose the unconditional model output variance Y as the sum of conditional variances up to the k-th order: where From this, we can derive the second-order index S ij , which explains the proportion of variance due to the interaction between X i and X j : and so on until order k. However, estimating all terms in Equation 7 is unattainable when k is large,  [13] functions. N = 15000 as they result in 2 k − 1. In this case, we can compute the total order index or S T i , which measures the proportion of variance due to the first-order effect of X i jointly with its interactions with the other parameters [17]: Here we computed first (S i ), second (S ij ), third (S ijk ) and total-order (S T i ) Sobol' indices of (N, n, ε, θ) using the Jansen [18] estimators.
All our workflow is summarised in Fig S2. We also computed the first and total-order Sobol' indices of each model input in each test function (Fig. 2). The well-established Sobol' method provides a benchmark against which the robustness of the PAWN index can reliably be compared. Fig. 2 will be of use when discussing how uncertainty in the PAWN design parameters propagates to the PAWN index itself and undermines its capacity to provide solid results in a factor prioritization/factor screening setting (see below) [19]. The R code to replicate all the workflow is available in GitHub. Fig. 3 presents the uncertainty distribution of PAWN indices (T i ) for each model input and function. We ran three simulations: the first, with θ including the mean, the median and the maximum as possible summary statistics (max ∈ θ). The second, with θ including the mean and the median only (max / ∈ θ). This aimed at isolating the effect that extreme KS values, which might be obtained for specific conditioning intervals, have in the final PAWN index. We ran the third simulation with N ∼ U(5000, 10 4 ), n ∼ U (15,20) and (max / ∈ θ) in order to check how much uncertainty does the PAWN index embed once their design parameters are left to fluctuate within optimum bounds, i.e. sample sizes that ensure convergence (Fig. S1). This latter run reflects an ideal scenario, one in which the best values of the PAWN design parameters are known beforehand.

Uncertainty analysis
The selection between max ∈ θ or max / ∈ θ determines whether the uncertainty distribution of some PAWN indices becomes unimodal or bimodal. This is the case of all model inputs in the Liu et al. [3] function and of X 1 , X 2 in the Sobol' [12] G function (Fig. 3). The upper mode arises when max ∈ θ and reflects the uncertainty in T i when the most extreme KS value is used as a summary statistic. The lower mode displays the uncertainty in T i when max / ∈ θ, this is when the summary statistic is a central tendency measure.
In a factor prioritization context, i.e. when the aim is to sort the parameters according to their contribution to the model output variance [19], including max ∈ θ raises the likelihood of producing a biased ranking. For instance, 28% of the probability density area of X 1 and X 2 overlap if max ∈ θ in the Liu et al. [3] function (17% if max / ∈ θ), although Liu et al. [3] show X 1 to be more influential than X 2 . In the Sobol' [12] G function, the overlap between X 1 and X 2 is of 10% (1.7% if max / ∈ θ) and of 23% between X 2 and X 3 (14% if max / ∈ θ), making the probability of reversing the ranking of some influential model inputs non-negligible (Figs. 2-3).
In a factor screening context, i.e. when the aim is to distinguish influential from non-influential parameters [19], the uncertainty in the PAWN design parameters might also lead to biased results, regardless of whether max ∈ θ or max / ∈ θ. In the case of the Sobol' [12] G function, 14-18% (11-14%) of the probability density of X 2 , which has a non-nihil effect, over-   [6]. Only S T i whose lower 95% confidence interval does not overlap with S * T i are considered truly influential. The approximation error for S i , S * i , is smaller than 0.003 and is not shown in the plot.
laps with X 4 , ..., X 8 , which have no effect at all. This defective screening is exacerbated in the case of the Morris [13] function: 17-20% (12-13%) of the probability density of X 8 , ..., X 10 , which have a moderate firstorder effect, overlap with that of X 11 , ..., X 20 , which are non-influential. The chances of mistaking relevant for non-relevant parameters is even higher in the case of parameters whose influence in the model output is through interactions only: this is the case of X 1 , ..., X 6 in the Morris [13] function, whose degree of overlap with X 11 , ..., X 20 , which are non-influential, ranges between 70-90% (Figs. 2-3). Fig. 3 also shows that, once the PAWN design parameters are set to their optimal range, the overlap between the model inputs is considerably reduced. In the case of the Liu et al. [3] and the Ishigami and Homma [7] functions, the percentage of overlap goes down to zero. However, the chances of wrongly screening the model inputs remain non-negligible for both the Sobol' [12] G and the Morris [13] functions. In the former, there is 13% overlap between the slightly influential model input X 3 and X 4 , whose effect can not be differentiated from the approximation error. With regards to the Morris [13] function, the chances of characterizing as non-influential parameters that have a significant non-additive effect in the model output remain high: the overlap of X 1 , ..., X 5 and of X 6 with X 11 , ..., X 20 stays between 15-27% and 80-90% respectively (Figs.  2-3, Fig. S3).

Sensitivity analysis
Figs. 4 presents the Sobol' first (S i ) and total (S T i ) indices for max ∈ θ and max / ∈ θ after pooling the values from all functions and parameters (the Sobol' indices for each function and parameter are shown in Figs. S4-S5). The boxplots thus inform on how much each of the PAWN design parameters contribute uncertainty to the PAWN index. In Fig. S6 of the Supplementary Information file, we prove that the results displayed in Fig. 4 are robust without having to offset the stronger weight that the Morris function might have in defining the trends due to its much larger number of model inputs.
As shown in Fig. 4a, the first-order effect of θ and N is much more variable in the max ∈ θ setting, suggesting that their degree of contribution to the PAWN index uncertainty might considerably be function-dependent. This variability is highly reduced in the max / ∈ θ setting, which provides a more robust account of the extent to which each design parameter contributes to define the PAWN index. In this setting, the selection of the initial sample size (N ) and the number of conditioning intervals (n) convey c. 75% and 15% of the PAWN index uncertainty respectively. The stochasticity in the sampling of Y U (ε) is influential through interactions in both settings, whereas the selection of the summary statistic (θ) has a nearly nihil effect in the max / ∈ θ setting. Remarkably, a significant proportion of the PAWN index uncertainty in both settings is explained by interactions between its design parameters (Fig. 4b).
To gain further insights into the structure of these non-additivities, we computed second and third-order effects, shown in Fig. 5 (the second and third-order Sobol' indices for each function and parameter are shown in Figs. S7-S10). The interactions that have a significant effect on the model output, regardless of whether max ∈ θ or max / ∈ θ, involve the initial sample size (N ) with the number of conditioning intervals (n) or the stochasticity in the sampling of Y U (ε). Such second-order effects might contribute up to 15% of the PAWN index uncertainty. As for significant third-order effects, the interaction between N, n, θ and N, n, ε might convey up to 15% and between 5-10% of the PAWN uncertainty in the max ∈ θ and max / ∈ θ settings respectively.

Discussion and conclusions
Two elements emerge from our work: the PAWN index is sensitive to the design parameters, and this sensitivity has a complex pattern which makes the use of PAWN, or better the tuning of the PAWN design parameters, a delicate affair. The chances of wrongly ranking or screening the model inputs are significant even when the design parameter space does not include the max as a possible summary statistic, but only measures of central tendency. Even in an ideal setting, where the optimum choices for (N, n, ε, θ) are known before actually running the sensitivity analysis, the PAWN index might be incapable to differentiate between non-influential model inputs and influential model inputs whose effect in the model output is fully through interactions.
The fact that the sensitivity of the PAWN index to its design parameters is complex, including important interactions up to the third order effect, implies that finding the perfect range of design parameters to use PAWN safely and efficiently is not easy. The non-additivity of PAWN only unfolds once the values of its main design parameters are moved simultaneously within reasonable uncertainty ranges; this is, when all the forking paths and divergences leading towards its computation are assessed at once. Instead, in the paper where the "generic approach" is presented, Pianosi and Wagener [2] analysed the influence of (N, n, θ) on PAWN by combining different discrete point-estimates for N and n (see their Fig. 3), or by changing the value for either N or n while keeping the other design parameters fixed (see their Figs. 6 and 7). This approach is very similar to a one-at-a-time (OAT) sensitivity analysis, a method that can not detect interactions between model parameters due to its incomplete examination of the uncertainty space [21] -akin to attempt a description of a labyrinth after having walked only a single path.
The present findings do not suggest discarding PAWN as a sensitivity measure. Moment independent measures have a role to play in sensitivity analysis of output with long-tailed distributions. Additionally, they may find an ideal use in settings where the out-  [20]. The larger the distance between the boxplots and 1, the larger the role of non-additivities in defining the PAWN index. [7] T. Ishigami and T. Homma. "An importance quantification technique in uncertainty analysis for computer models". Proceedings. First International Symposium on Uncertainty Modeling and Analysis 12 (1990), 398-403.     [2]. Only PAWN indices whose lower 95% confidence interval does not overlap with the grey frame can be considered truly influential. Note, for instance, how the measurement error in computing PAWN does not allow to differentiate X 1 , ..., X 7 from X 11 , ..., X 20 in the Morris function, which are respectively influential and non-influential according to Sobol' indices ( Fig. 2)