Delft University of Technology Update (1.1) to ANDURIL—A MATLAB toolbox for ANalysis and Decisions with UnceRtaInty: Learning from expert judgments ANDURYL

This is an update to PII: S2352711018300608 In this paper, we discuss ANDURYL, which is a Python-based open source successor of the MATLAB toolbox ANDURIL. The output of ANDURYL is in good agreement with the results obtained from ANDURIL and EXCALIBUR. Additional features available in ANDURYL, and not available in its predecessors, are discussed. © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

Overview of result comparison AI and AY against CC.

Software
Number of studies compared Number of different scores in Table 2 Number of scores with approximation differences

Motivation and significance
A MATLAB toolbox, named ANDURIL, 1 (AI), implementing Cooke's classical model [1] for structured expert judgment is presented in [2]. Until recently EXCALIBUR 2 (CC) was the only available software implementing Cooke's classical method. Though Eggstaff's studies were based on a MATLAB implementation 3 [3,4], the developed source code for these studies is not available for distribution.
In this paper we present ANDURYL (AY), which is a Python [5] implementation of Cooke's classical model [1]. The program name replacing the I with Y indicates that the AY source is based on Python instead of MATLAB. The program structure of AI has been retained in this implementation. The main obvious advantage of AY is that the MATLAB license required for AI is not required for AY. Other added features with respect to AI will be discussed along this paper.

Software description
AY is run from the command line with the Python function main.py, as it does not have a graphical user interface. Users can adapt the code to run their own studies in sequences as presented in anduryl_example.py. The program structure is setup in such a way that there is one main Python function anduryl which is used to run the full scope of AY. In this main script, the data obtained from expert judgments may be entered in order to conduct the desired analysis. The input variables are set as global variables and backed up. With 'restore' statements the variables can be reset to the original input values, which can be used in later calculations, but might also be useful in further developments of AY. In the current implementation, this is used in the process for investigating the robustness of the obtained Decision Makers (DM The functions of AY are similar to the functions presented for AI. AY keeps its architecture as similar as possible to that of AI. The main difference however is in the function calcu-late_weights, which merges AI's functions global_weights and item_weights. A more detailed explanation of the program is presented in the Supplement. The remaining differences will be further discussed in Section 4. Next we present results of comparing AY's output to both CC and the MATLAB implementation AI.

Comparing output of ANDURYL with previous expert judgment studies
In [4], 33 post-2006 studies using Cooke's classical method are presented using CC. We use these data to compare output from AY to both CC and the MATLAB implementation AI of the previous paper [2]. Table 2 presents the results reported in Table 1 of [4] (the study name followed by CC) extended with calculations from AI (AI) and AY (AY). Table 2 includes the statistical accuracy (SA), information (In) and the combined scores (Co).
Equal weight, Global weights without optimization (Global No Op.), Global weights optimized (PW Global), Item weights optimized (PW Item) and the expert with highest combined score (Best Expert) are presented. In the supplement, an extended table including Item weights without optimization (Item No Op.) and the expert with the lowest combined score is presented.
From the 33 studies reported [4], 14 were performed using 5 quantiles, 3 with quantiles other than the 5th, 50th and 95th or contained missing items for some experts. These results cannot be compared with AI and are marked by (*). On the EBPP study, a software error appeared in the MATLAB code. This error will be resolved in a future update of AI. Hence, a total 18 studies were compared with AI. Each study in Table 2 presents 17 numbers. Differences between the calculations reported in [4] and AI are highlighted in blue. There are a total of 153 blue numbers in Table 2 and hence an agreement of (1− 13 306 )×100 ≈ 96% between AI and the calculations reported in [4] for the studies that can be compared. From the 13 numbers 4 are clearly approximation differences. Notice that though the numbers in CC are MATLABbased we compare our results to the published results in [4] and no way to investigate further the approximation used in [4] is available to the authors. Additionally, 9 numbers are equal to the results obtained with AY. These two observations would bring the agreement to 100%.
Differences between the calculations reported in [4] and AY are highlighted in red in the same table. There are a total of 23 red numbers in Table 2 and hence an agreement of (1 − 23 561 ) × 100 ≈ 96% between AY and the calculations reported in [4].
From the 23 red numbers 8 are clearly approximation differences. Additionally, 9 AY results are equal to those obtained with AI which would bring the agreement to ≈99%. This result indicate that both AI and AY may be used with enough confidence by interested users.
The results of the comparison are summarized in 1.
In Table 2, 9 values are equal for AI and AY but different compared to CC. The authors checked the input files of the ''Icesheets" study. It was found that the realization file (*.rls) and the file with assessments (*.dtt) presented inconsistencies in the labeling of assessment questions. We speculate that this could be the source of this misalignment of both AI and AY with CC.
The differences found in the ''Gerstenberger", ''Goodheart" and ''Hemopilia" study are related to the optimization process. For example, the optimization process for ''Goodheart" data shows in CC 1 expert as the optimal combination. For both AI and AY the optimal combination consists of 3 experts. Without the source code of CC the authors cannot investigate further this source of misalignment. Table 2 Comparison of results presented in Table 1 of [4] (CC) and calculations with AI (AI) and AY (AY). a The authors found a software error in AI, this particular study has not been validated to AI. In a future update of AI the software error will be solved.

Impact
The advantages of AI, discussed in [2], with respect to CC are inherited by AY. A number of limitations of AI were discussed in the supplement of [2]. Besides the full open source character using Python as a programming language, two other advantages were implemented in comparison with CC and/or AI. These are elaborated further next.

User defined quantiles
From Table 2 it may be observed that AY presents good agreement with the 11 studies reported in [4] where 5 quantiles (5th, 25th 50th, 75th and 95th) were used to elicit expert judgments, hence we do not elaborate further on this issue.
As stated earlier, AY provides the option of user defined quantiles. CC allows for the use of 3, 4 or 5 user defined quantiles. Fig. 1 presents a hypothetical example of 4 experts: A, B C and D, assessing 10 calibration or seed variables. The realization (R) is also shown.
Intuitively, the reader may already appreciate that expert A will be informative but with low SA. Expert B will be less informative and also present low SA. The SA for C and D will be equal, however, D will be more informative than C. Table 3 presents a comparison of the calculations of SA and informativeness between AY and CC assuming experts elicited 10th, 50th and 90th percentiles of their uncertainty distribution. The reader may appreciate that the agreement between the calculations performed by CC and AY is almost exact.
Because the source code of AY is available and extended with respect to CC, practitioners may use more that 3, 4 or 5 user defined quantiles to elicit expert judgments. The same hypothetical example with four experts as in Table 3 is used but with experts assessing 7 quantiles (10th, 25th, 35th, 50th, 65th, 75th Table 4 Statistical accuracy and Informativeness computed with AY with 7 quantiles for the hypothetical example presented in Section 4.1 assuming experts elicited 10th, 25th, 35th 50th, 65th, 75th and 90th percentiles of their uncertainty distribution.
Though this option is available in AY, it is unclear to the authors its applicability in practice since the complexity of eliciting expert judgments grows significantly with the number of quantiles to be elicited from experts. It is also unclear to the authors if no study considered the elicitation of more than 5 quantiles because this feature was not available in any software implementation.

Missing items for some experts
In [6] two panels of 9 experts were gathered in order to assess uncertainty over economic growth and oil prices for Mexico in 2020 and 2030. In the panel corresponding to international gas and oil prices, expert A did not answer 10 of 26 calibration variables. No answer for expert D was recorded for 5 calibration variables. Similarly, no answer to 1 calibration variable was observed for expert G. The results of calculations obtained with missing items for both AY and CC are presented in Table 5. Similarly as in Table 3, the agreement between the calculations obtained with CC and AY is almost exact.

Conclusions
The MATLAB toolbox named AI for combining expert judgments applying Cooke's classical model for structured expert judgment has been extended. The new software is called AN-DURYL. The main purpose for developing these toolboxes is to create open source solutions that can be used by practitioners and researchers who are interested in applying or developing further Cooke's method. In comparison with AI and/or CC, AY presents the following new features: AY has inherited all advantages of AI discussed in [2]. Additionally, AY is fully open source and allows for user defined quantiles (see 4.1) and missing items (see 4.2).
The software tool presented in this paper validates Cooke's classical model successfully with a range of studies presented in [4]. Despite the limitations of the current version of AY, it is to the authors belief that similarly as AI the developed toolbox will be valuable to those who are interested in developing and further applying the method. It is the ambition of the authors to extend AI and AY with more features than those currently available in CC and with the more recent techniques of elicitation of multivariate dependence [7].

Declaration of competing interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.