An analytical framework for postmortems of European foreign policy: should decision-makers have been surprised?

This paper develops a novel theoretical framework for the conduct of postmortems after major foreign policy surprises for the European Union and its member states. It proposes a taxonomy of surprise which eluci-dates how o ﬃ cials or organisations experience both sudden and slower-burning threats. It argues that foreign policy surprises in European settings require a closer look at who was surprised, in what way, and when. The paper outlines six vital performance criteria and three key attenuating factors, allowing us to better ground judgements about foreign policy performance as well as to advance realistic recommendations on how to improve.


Introduction
Intelligence analysts, diplomats and foreign policy-makers across Europe admit they have been taken by surprise several times over the past decade by momentous events in the countries neighbouring the European Union (EU). With regard to the Arab uprisings, the chief of the UK Secret Intelligence Service claimed the events were 'unpredictable' because 'all the organisations that hold the secrets had no clue it was going to happen.' 1 Similarly, the rise of the so-called Islamic State between 2013-15 rapidly outpaced expectations and surprised authorities and terrorism analysts in both the United States and Europe. 2 Finally, for many observersgovernment officials and experts alike 3a surprising chain of events unfolded in Ukraine after President Yanukovych's unexpected decision not to sign the Association Agreement with the EU in 2013 sparked large public protests, the flight of the President to Russia, the Russian annexation of Crimea and a military conflict in Eastern Ukraine.
Surprise is an inherent and recurrent feature in international relations. 4 Yet many practitioners and scholars argue that the contemporary threats to national security of today are particularly difficult to anticipate and analyse. This is, variously, attributed to the contestation of the post-Cold War order by authoritarian states, the gradual erosion of the power of the state vis-à-vis individual citizens and non-state actors, the power of social and political movements enabled by advances in technology, or the sheer speed, volume and impact of information production and opinion formation today. 5 While some argue that people and skills in the intelligence community have not kept up with this transformed threat environment, 6 one can also criticise that some skills and insights from the Cold War era, for instance on the assessment of Russian military capabilities, tactics and disinformation techniques, have been partially forgotten. The intelligence literature emphasises the cost of such surprises to decision-makers and, ultimately, the citizens they serve. 7 This conclusion is shared by the literature on disasters and emergencies in other areas. 8 Surprised organisations and decision-makers are more likely to miss opportunities for preventing or pre-empting attacks and other threats, tend to be less well prepared for managing the unavoidable crises, and more likely to look ill-informed and out-of-control in the eyes of citizens, taxpayers, and voters. Therefore, a key function of estimative intelligence is to reduce the probability for such surprise by trying to anticipate what might happen, its likelihood, and the expected consequences for government interests. 9 Foreign policy surprises with significant negative consequences are usually accompanied by public criticism and sometimes followed by postmortem inquiries. 10 These typically aim to ascertain not only the soundness of political judgements and actions taken, but also ask whether policymakers should have been surprised in the first place, whether the surprise was avoidable and, crucially, what should and could be done to reduce the potential for such surprise in future cases. Postmortem exercises do often vary in practice in terms of the relative emphasis they place on accountability versus lesson-learning or how broad and narrow their mandates are. However, the central purpose of a postmortem as an analytical tool is to identify and learn those lessons from a specific case that might improve an organisation's structural capacity to better anticipate and react to future threats. It is true that any lessons identified from postmortems will apply first and foremost to threats and regional contexts that are similar to those at the centre of the inquiry. They cannot deliver the same lessons as root and branch reviews of the performance of intelligence organisations and foreign policy systems across a range of tasks. However, we know that postmortems have in the past produced wide-ranging recommendations regarding reform training, resources, institutions and processes, many of which are subsequently implemented, for better or worse. Therefore, better postmortems can improve the basis for judging actors' performance and for making more realistic recommendations about what can and should be improved. Even though strategic surprise can rightly be described as 'the academically most advanced field in the study of intelligence', 11 we identify four areas for potential improvement to the state of the art: To start with, most of the core assumptions and findings in the literature tend to focus on strategic surprises rooted in a deliberate strategy of states to deceive adversaries and thereby seize the advantage, usually through surprise military attacks. This focus leaves out those instances in which actors are taken by surprise by slower-burning, indirect or non-kinetic threats by state as well as non-state actors. 12 Since these types of threats have been on the rise, a suitable analytical framework should be able to address them.
Secondly, surprise itself remains underconceptualized in the literature. For instance, Handel discusses the incentives to seek surprise attacks as well as the relationship between the psychology of surprise and the structure of conflict. 13 The literature theorises who is likely to engage in surprise attacks and who are the likely victims and when such attacks are likely. 14 Apart from the basic distinction between strategic and tactical surprise or typologies that address the impact of surprises, 15 the existing literature offers little help in distinguishing between different kinds, degrees, and objects of surprise or how surprise may differ significantly among as well between analysts, policy-planners, and decision-makers.
Thirdly, most evaluative judgements of intelligence performance concentrate on distinguishing intelligence failures from policy failures, rather than evaluating more comprehensively the foreign policy process and how it might be improved. Although discussions about the relationship between intelligence analysts and policy-makers appear frequently in the literature, 16 there are few attempts to ground foreign policy postmortems in broader normative frameworks concerning the role of experts and evidence in public policy of democratic states. As a result, this paper is not focused narrowly on 'intelligence failure postmortems', but on foreign policy postmortems where actors often closely interact and depend on each other for their competent performance.
Finally, we want to advance a framework that can be applied to non-US contexts. Most of the strategic surprise literature, and particularly some of the most influential works, are written by USbased scholars on cases that involve the United States government as the main addressee and the US intelligence community as the main producer of estimative and warning intelligence. We argue that the US system differs significantly from European settings in terms of the role of the intelligence community, its relationship to policy-makers as well as the foreign policy process. This is relevant because the underlying conceptual toolbox for postmortems derived from this literature contains some hidden or inapplicable assumptions when used in a European setting. By European settings we mean the foreign policy machinery in Brussels and national foreign policy systems that sometimes support EU objectives, sometimes pursue their own foreign policy interests separately from, if not completely independent of, the EU. This paper is not looking at bilateral relations between member states or relations between some member states and non-EU actors.
The article proceeds as follows: In the first section, we will discuss the most important differences between US and EU contexts and how they matter to postmortems. Next, we offer a more finegrained conceptualisation of surprise and explain why a better measurement of who is surprised in what way matters for postmortem exercises. Thereafter we discuss performance expectations towards knowledge producers and decision-makers grounded in a conception of knowledgesensitive public policy in democracies. We distinguish between a range of distinct performance aspects and their trade-offs, but also discuss criteria for distinguishing excusable shortcomings and mistakes from avoidable, negligent or reckless errors in threat assessments. The final section engages in more detail with the how-to questions of designing postmortems that avoid common problems of hindsight bias and over-determinism.
Our theoretical argument is underpinned by the findings from the process-tracing of warningsresponse dynamics in six cases studies involving different EU and member states conducted in the context of two major research projects: Rwanda 1994, Darfur 2004, Georgia 2008, Arab uprisings 2010, Ukraine 2013, ISIS/Daesh 2014. These involved extensive document analysis of official, media and NGOs sources as well as more than 170 interviews across the two research projects underpinning this research conducted over a period of 11 years. 17

Why differences between European and US contexts matter to foreign policy postmortems
What are the main differences between US and European settings that matter most to surprisefocused postmortems? As space constraints limit a more comprehensive investigation of all the potentially relevant factors, we propose to focus here on the role of the intelligence community visà-vis other potential producers of estimative and warning intelligence, the key actors and features of the foreign policy process as they shape receptivity levels and politicisation dynamics, and, finally, the nature of the relationship between knowledge-producers and decision-makers given the inherent tensions over surprising and inconvenient knowledge claims.
The US is unusual compared to other states in terms of the absolute and relative size of its Intelligence Community (IC) and the differentiation among agencies specialised in different types of intelligence. It is a highly professionalised and institutionalised community with dedicated career tracks, training and doctrines, including on warning. It has a dedicated structure for estimative intelligence over the mid and long-term via the National Intelligence Council, including a National Intelligence Office for Warning and an overall Director of National Intelligence. It has the capacity to produce warning intelligence about a wide-range of countries and phenomena and the coordination structures to arrive at a consensus view among participating agencies. It operates with substantial confidence in relaying these assessments to decision-makers and keeping them secret. As a result, there is more demand as well as justification for conducting postmortems focused more narrowly on the performance of the intelligence community.
In contrast, for historical reasons but also reflecting foreign and security priorities, the dedicated intelligence structures in European settings tend to be smaller in relative terms as well as less professionalised, institutionalised, coordinated and autonomous from decision-makers. As a result, a much greater proportion of relevant knowledge about a variety of threats will be produced by non-IC analysts, whether they are diplomats and desk officers of Foreign Ministries or analysts from agencies and ministries devoted to, for instance, development and humanitarian aid. Moreover, the IC in most European states, and particularly in the Brussels context, is less likely to speak with one voice and with less authority than in the US system. While the EU does not have an intelligence agency as such or the capacity to send agents into the field, it does in principle have the capacity to gather, analyse and produce intelligence in a broader sense from its own varied sources, including some 140 EU delegations, 16 civilian and military missions, or the EU Satellite Centre. Member states share, to various degrees, diplomatic reporting and assessed intelligence with EU institutions and each other through a common communication infrastructure (COREU). 18 EU and national officials form a close professional network labelled by some an 'epistemic community', 19 or a 'community of practice' 20 in foreign affairs and strategic intelligence. 21 The implications are threefold: first, one needs to cast the empirical net wider than units that are ostensibly concerned with 'intelligence.' Even nationally focused postmortems need to take the EU context and related intelligence flow into account when assessing what was knowable and known. Warnings will be diverse in content, reflecting distinct cognitive lenses and policy agendas of the originating agencies and ministries.
Secondly, US foreign policy has traditionally reflected a grand strategy as a superpower with geographically wide-reaching interests and the willingness and ability to underwrite security and shape the rules of global order. The President as the Commander-in-Chief has significant authority to shape and change national security and foreign policy or to appoint and let go secretaries of state. This authority is further reinforced by the electoral system and the opportunity to make thousands of political appointments across federal government when coming into power, including in ministries in charge of foreign and security policy. This means that decisions can be taken and implemented quickly when warnings are prioritised at the top. The centralisation of authority also means that foreign policy can be a great asset to the president's prestige and electoral advantage, but can also turn into a major liability after alleged or actual failures to anticipate threats and prevent harm. Foreign and security policy has high salience inside the Washington beltway and is thus vulnerable to partisan politicisation pressures.
In contrast, with the exception of the UK, most political systems in Europe are not two-party systems and do not provide the Heads of State with such power in foreign and security affairs. In fact, most countries are governed by coalitions of two or more parties. This often leads to a situation where Foreign Ministers and heads of government hail from different parties. The EU itself is an extreme case of the dispersion of authority where most decisive action requires extensive consultation and often unanimous agreement (or at least constructive abstention) from all member states represented in the Foreign Affairs Council (FAC). The EU's HR/VP for Foreign Affairs and Security Policy has significant power of administrative resources and informal power for agenda-setting, but little autonomous power for decision-making. The EU is therefore often vague in articulating a ranked order of interests and priorities. This means not only that postmortems in European settings need to be realistic in their expectations about the speed of decision-making and the prospect of finding a consensus, but also consider a wider range of decision-makers with potentially variable levels of receptivity to warnings.
It largely follows from the above that the 'intelligence-policy nexus' 22 in Washington looks quite different to those in Brussels, Berlin, London, Paris, or Warsaw. A significant part of the US-centred literature on strategic surprise and postmortems engages more or less directly with the significant tensions between a well-resourced and autonomous IC and the political leadership of a given administration. Generally, this relationship is more formalised and the boundaries between the two are more clearly defined and policed than in European settings. A great deal of attention is focused on the question whether a negative surprise was due to an intelligence or a policy failure. The fear that politicisation pressures undermine professional standards in analysis, make products less objective, and producers less credible is a central concern of the literature alongside with discussions over how an optimal relationship should look like (see further below on the Kent-Kendall-Gates debate).
In contrast, in European settings, the relationship between knowledge producers and decisionmakers is more fluid and closer as compared to the US. There tends to be a less formal distinction between information and analysis and the policy-planning and decision-making process in terms of process as well as personnel. Given the dispersion of authority, and the variety of knowledge producers, one can find less opportunities and incentives for attributing blame and taking credit. Moreover, top-decision-makers in European settings, and especially within the EU context, have less authority to pressure and replace officials and analysts as compared to the US system. There tends to be less turn-over among civil servants after elections with most senior officials, analysts and diplomats staying in office after a change in administration, except for a handful of the most senior positions. This is particularly true for the EU's civil service. Commissioners and High-Representatives may change every four years but heads of Directorates-General tend to stay for longer. This also reduces the potential for politicisation pressures somewhat. In this supranational setting, politicisation as a source of bias can come in various forms and affect relationships in different ways. It is less party-political (although this can a role too), but more frequently structured around perceived or actual national foreign policy biases or the policy agendas of sub-systems, for instance, regarding the European Neighbourhood Policy, Development, Trade or the Common Foreign and Security Policy.

Conceptualising and measuring surprise in foreign policy
The extensive literature on warning intelligence and (strategic) surprise is strongly concerned with questions of explaining why surprises occurred and whether they were avoidable, 23 whereas little attention is devoted to differentiating types or degrees of surprise. Although most authors agree that surprise is a matter of degree, 24 there is no consensus on how to best describe the spectrum between the extremes. For some, a complete surprise is the occurrence of an event that was not even considered a low probability, whereas more partial surprises are those when the subjects may have anticipated some elements, but not others, for instance, recognising the possibility or even probability of an attack from a given adversary, but failing to answer correctly questions about the timing, location or means. 25 Another closely related distinction is made between a strategic surprisethe broader and longer-term assessment of a given threat for the main benefit of senior decision-makers, and a tactical surprisecentred on shorter-term, more focused questions about the specificities of threat manifestation, prevention and management. 26 Other works define strategic surprise as a lack of preparedness based on incorrect judgements regarding when, where or how an attack would take place, without specifying a particular degree of surprise. 27 Kam identifies three key elements to measure the degree of surprise: whether a victim's inconsistent or erroneous expectations and assumptions were held more or less strongly; the timing; and finally, the degree to which the victim was prepared to deal with the event when it occurred, but not the final outcome or ultimate success of policy. 28 In contrast to Kam and others, we propose to distinguish questions about the nature and degree of surprise in a cognitive sense from questions of whether such surprise was justified, or questions of its impact. Whereas the first is core to what is to be explained here, the latter two muddy the waters by introducing either new normative considerations or largely unrelated extraneous factors into the explanandum. For instance, there may well have been early and clear warnings from some analysts, but recipients could still have been surprised because these were from sources with dubious trackrecord, who had cried wolf before, or who were contradicted by other more authoritative sources at the time. Similarly, being well prepared is as much a question of resources, capacities, good political judgement, and chance as it is about accurate risk identification and threat assessment. Governments may well be completely surprised by an event but turn out to be rather wellprepared to deal with the consequences as military assets and civil first responders can be used for a range of different contingencies.
We argue instead to strip back surprise to the first element identified by Kam and define surprise in an actor-centred and primarily cognitive way as the degree to which a given individual, group or organisational unit in government recognises that recent or current events of substantial consequence to high-value interests contradict pre-existing assumptions, analytical judgements, and expectations. Indicators of surprise, or the lack-thereof, at the actor or organisation level are sudden shifts in attention and organisational resources to these events and/or the acceptance of new evidential and causal claims relating to this threat. The lack of policy change or preventive action are, however, not reliable indicators of surprise. The measurement of surprise is highly time-bound and should typically relate to the immediate aftermath of a threat manifesting itself in the eye of the decisionmaker. Some postmortems may raise questions about what might be defined as the immediate aftermath of threat manifestation for three reasons: The first concerns the nature of the threat itself. Except for clear cut-cases of attacks where the armed forces of a state are clearly identifiable as the source, as in Pearl Harbour, many other threats may evolve and manifest themselves in more gradual or opaque ways, for instance, the growing territorial control exercised by ISIS or cyberattacks that are detected late and where the scope and scale of the damage remain unclear for a while. Second, evidence related to threats may initially be ambiguous or strongly contested as to its attribution, such as in Russia's deployment of 'Little Green Men' without official insignia in Crimea flanked by the Russian authorities' outright denial of any involvement. Finally, actor-centred surprise measurement needs to deal with the problem of false or distorted memory if no evidence is available to measure whether and how officials expressed surprise in response to new information received at the time. Leaving deliberately untruthful and misleading statements aside for the moment, both officials as well as decision-makers may misremember their own sense of surprise at the time, because memory itself can be influenced by largely unconscious concerns of the present, including feelings of regret or shame. For instance, analysts may inadvertently underestimate their degree of surprise because it implies a degree of professional failure and potentially even sanctions whereas decision-makers may unconsciously exaggerate their sense of surprise to justify why they did not act against a threat even though they had considered it possible. Postmortem analysts need to ascertain in an as objective and fine-grained way as possible at what points in time officials or organisations experienced a sense of surprise without engaging prematurely in critique.
In Table 1 we present a taxonomy of surprise in foreign policy that aims to better support the purpose of postmortems. We propose to assess the overall degree of surprise across three dimensions, which are both additive to the overall degree and also serve distinct analytical purposes as they each characterise different forms of surprise. The first dimension is the degree of cognitive dissonance caused as a function of the gap between what actors believe to be true in the aftermath of the threat manifestation and their prior beliefs about the threat. The largest scale surprises -Taleb's black swans or the proverbial bolts-from-the-blueare threats that were not even considered by actors. At the other end of the spectrum are threats that may well have been considered and deemed at least possible, but deemed too unlikely to significantly shift attention, material resources or change policy. In the most extreme case of surprise, actors are likely to feel a sense of cognitive shock and strong pressure to fundamentally transform their threat perceptions, whereas in the mildest version it would entail updating the probability assessments related to the threat. Again, this does not imply necessarily a normative judgement of whether these surprises were or could have

Entirety of government, analysts and decisionmakers
Most analysts and decision-makers Only some analysts and decision-makers been foreseeable as this entails looking at case-specific diagnostic challenges as well as actor capacities as we discuss in the next section. The second dimension is closest to the existing writing on strategic surprise in so far as we focus here on the scope of the surprise and draw on the distinction between strategic threat assessment and tactical/operational threat assessment. It is often argued, for instance by Dahl, that strategic intelligence is less difficult than tactical intelligence even though the latter may be more effective for prompting decision-makers to pay attention and act preventively. 29 The question here is how wrong or right actors were across a range of threat-relevant questions: 'who is posing a threat', 'why ' and 'under what conditions', 'what are their current and future capabilities', 'what are their concrete plans to do what', 'where and when'. 30 It is relatively easy to identify an actor with hostile intentions, but much more difficult to ascertain when and how such intentions can translate into significant harm. Even more complicated is the diagnostic challenge for threats that are not emanating from one particular actor, such as a state or terrorist groups, but which emerge from bottom-up dynamics of multiple actors, trends and social movements, as was the case the Arab uprisings. Even if most actors across Europe were not surprised that instability in one North African country could spill-over and cause instability in another, they were still surprised about the way it happened, the timing, speed and extent of the spread in the region as well as broader and less immediate consequences for European countries. 31 Finally, we need to look more closely at the spread of who has been taken by surprise within a political system. While the conventional distinction between analysts and (political) decision-makers is useful, we aim for a more fine-grained assessment for two reasons: Firstly, as mentioned in the above, foreign policy authority is dispersed in a European context, both at the member state level through coalition governments, and at the European level, where multiple foreign policy institutions co-exist and sometimes overlap. We know for instance that senior decision-makers within EU institutions and the Foreign Affairs Council differed in their level of surprise concerning the events in Georgia in 2008 and Ukraine in 2014. 32 Secondly, many threats are monitored and assessed by multiple parts of the EU machinery and through different types of products, engaging different parts of what might be termed the intelligence community, including diplomats and other officials working on threat-relevant technical issues such as trade, energy and home affairs, especially if threats are not purely military in nature.
It matters whether the sense of surprise was near universal among the most relevant knowledge producers or whether the spread of awareness among either analysts or, indeed among decisionmakers was particularly uneven. Divergences amongst different kinds of analysts or agencies can be expected given differences in sources used, disciplinary backgrounds, analytical methods or informal norms, but could also arise from ways of information-sharing and joint analysis. Another form of an uneven spread in surprise could be seen in divergent perceptions among officials at various hierarchical levels, as senior officials may at times ignore, discount, or disbelieve assessments by more junior officials and may thus end up being more or less likely to be surprised by certain threats. Again, this does not mean that those who have been most surprised were necessarily at fault, but pinpointing more precisely who was more or less surprised can help to investigate more accurately the potential causes. It equally helps to arrive at more persuasive judgements, i.e., whether the observed differences can be explained, for instance, by weaker analytical capabilities of an organisation or rather were caused by political signals and administrative cultures hostile to inconvenient analytical judgements.

Should they have been surprised? Normative expectations for estimative intelligence production and reception in foreign and security policy
The challenge of casting sound judgements of estimative intelligence production and reception in foreign policy are just a subset of broader debates in International Relations (IR) and political science about whether the social sciences can produce sufficiently reliable knowledge for decision-making and the appropriate role of expert civil servants vis-à-vis democratically elected decision-makers. The field of IR has struggled to show that the knowledge base for anticipating attacks, civil war, and mass atrocities is reliable, specific, and actionable enough for decision-makers to accept and base preventive policy on. 33 More broadly, many IR scholars harbour doubts that one can establish defensible criteria for how and what policy-makers should know and learn, let alone on how to act, because of intrinsic limitations of all claims about the socio-political world, but also because they consider the notion of policy-success itself as contested. 34 This resonates with the view of some scholars outside of IR who believe that all processes of learning and policy evaluation are based on underlying intersubjective assumptions that reify particular interpretations of reality and imply distinctive value judgments. 35 In contrast, intelligence studies scholars have not shied away from judging the performance of analysts and, in some cases blaming mainly policy-makers for failures of prevention or lack of preparedness. A review of public postmortem inquiries by Farson and Phythian highlights how they can vary regarding their degree of openness and transparency of investigative processes, their autonomy from the executive as well as regarding the speed with which they deliver results. 36 Yet despite the growth in academic as well as public postmortem exercises, still little is known about the theoretical assumptions behind postmortems, including the definition of appropriate performance criteria towards either experts or decision-makers rooted in a sound model of knowledge use in foreign policy. 37 Some postmortems devote considerable attention to questions of what should have been done or not done in terms of political judgement about policy, such as the Chilcot Inquiry's 38 discussion of the premature abandonment of deterrence, whereas other postmortems are more focused on the accuracy of intelligence products and the performance of their producers, such as the 9/11 inquiry. 39 Scholars disagree in their expectations towards knowledge producers as to whether the provision of accurate intelligence or at least following a sound analytical process are 'good enough', or whether analysts should also bear some responsibility for ensuring that assessments are actually read, used and have some kind of impact on decision-making. 40 The answer to these questions depends on where one stands in the debate about the ideal intelligence policy-making relationship, commonly known as the Kent-Kendall-Gates debate. Supporters of the Kent-model emphasise the need for maximum independence between analysts and policy-makers in order to protect the objectivity of analysts and their products from politicisation pressures. Conversely, proponents of the Kendall and Gates-models advocate a closer relationship so that intelligence reports are actionable, relevant, and correspond to immediate policy requirements. 41 Furthermore, a key issue of contention is how much discretion is given to political judgement about whether, when and how to pay attention, prioritise and, ultimately, act. It may be relatively uncontroversial to expect that all plans for terrorist attacks that have reached a certain stage should be thwarted, whereas in most other cases and types of threat, assessments about what should have been the right decision, either with or without the benefit of hindsight, requires the consideration of a wider range of factors as well as some sophisticated counter-factual reasoning.
This narrowness in approach and the lack of clarity in intelligence studies and IR contrasts with the public policy evaluation literature, which is permeated by strong sets of expectations about 'evidencebased policy-making' and strong norms about precaution in environmental policy-making in areas such as healthcare or disaster management. 42 The epistemic basis for policy-making in foreign affairs is generally weaker than in domestic public policy. At the same time, expanding our conceptualization of surprise beyond state-led military attacks requires us to consider more diverse areas of relevant expertise and therefore include scientifically-grounded knowledge, for instance relating to the impact of climate change on migratory movements and conflict. Foreign policy-makers do not necessarily need to listen to advice on what to do and when to act because most of foreign policy, and especially conflict prevention, is not a technocratic exercise in doing 'what works' but involves a range of difficult value-judgements, trade-offs, and uncertainties about unintended consequences. 43 Sometimes policymakers realise that they do not have any realistic option to stop a crisis from happening or find that the only options are either politically not feasible or come with too high opportunity costs. Policy-makers still retain the right not to prioritise a particular threat above others and they may well disagree about some or all of the actions that are being suggested for preventive or mitigating actionthese decisions require a different set of criteria related to good judgement in foreign policy.
We propose to orient our foreign policy postmortems towards a normative guiding model of wellinformed, anticipatory, and collaborative foreign policy making. Well-informed relates both to the quality of the production process behind estimative intelligence as well as to the receptivity of decisionmakers to well-evidenced analysis of future threats from authoritative sources. Anticipatory does not mean that all future harm can be easily predicted or prevented, only that governments should aim to live up to their strategic and policy commitments contained in their foreign and security strategies with regard to pre-emptive, proactive, preventive, and resilient foreign policy. 44 It means that knowledge producers ought to invest a minimum degree of effort into the investigation of consequential and potentially threatening futures, whilst decision-makers should ringfence a minimum degree of bandwidth to engage with such futures and high quality warnings. Collaborative does not deny significant asymmetries between and inevitable tensions within the relationship between civil-servants and decision-makers, but posits that mutual respect and understanding of one another's distinct roles, duties, requirements and limitations is central to reaching sound analytical judgements about future threats as well as mitigating politicisation pressures. If we break this down even further in Table 2 below, we can see that knowledge producers as well as decision-makers can be measured against the following six performance indicators. Even though three of these criteria apply primarily to either knowledge producers or decision-makers respectively, it is important to recognise their interdependence. It is perhaps obvious that the extent to which decision-makers can be held accountable depends on the quality of the intelligence they have received, but equally, intelligence analysts may have good reason to attribute some of their failings in what questions to ask, when to communicate warnings and how to make them convincing, to overly remote, inaccessible, disinterested, hostile or even vindictive decisionmakers and the culture they create within an organisation.

Accuracy
Asking whether knowledge producers reached accurate judgements about threat aspects that mattered is an indispensable and undisputed component of postmortems involving a degree of surprise. It is necessary to test whether decision-makers should have been surprised about a given threat event if, in fact, they had been provided with (largely) accurate assessments of key aspects of it. Moreover, all knowledge producers ought to aim for maximum accuracy in their analytical judgements as inaccurate threat assessments can lead to costly mistakes. Assessments of accuracy should never be the end-point of assessing the performance of knowledge providers, but should leave ample opportunity to consider in a second step mitigating factors that would lead us to higher or lower accuracy expectations. For instance, while anticipating surprise attacks can be situated at the more difficult end of the diagnostic spectrum, other types of dynamics relating to mass atrocity risks or migration can be assessed in probabilistic terms on a more reliable epistemic basis. Yet, a number of authors rightly stress that one should be cautious about inferring a good analytical process from accurate assessments or vice-versa. 45 Analysts can be right in their conclusions for the wrong reasons as errors of over-and underestimation of risks can cancel each other out or unforeseeable chance events intervene as to influence outcomes. Jervis distinguishes between type 1 errors of inaccurate analytical claims but having followed a sound intelligence process, and type 2 errors where accuracy errors could well have been avoided with better processes of information collection and analysis. 46 Moreover, it is problematic to provide a 'definitive' measure of the accuracy of predictions and forecasts 47 as such claims may be expressed in vague, hedged, or highly uncertain ways and at various points in time in relation to a given event.

Timeliness
The accuracy criterion is closely related to the criterion of timeliness of intelligence at the moment when it is brought to the attention of relevant decision-makers. Especially in cases of bottom-up or slow-burning phenomena rather than surprise attacks, it is often easier to arrive at a more accurate and confident assessment of a threat if one waits for more signals and indications from an evolving situation, for instance, human rights violations or public protests exceeding country-contingent 'normal' levels. However, both deliberate waiting as well as unintentional delays can come at a high cost, so it is crucial for the analyst not to be overtaken by events and to provide assessments early enough to maximise options and minimise risks for decision-makers. 48 Some policy instruments such as financial aid targeted at some root-causes of conflict, or for example the forging of links with the newly important political actors after the Arab uprisings, require a significant amount of time to take effect. Moreover, some instruments such as the deployment of peacekeepers, election or border monitors requires a minimum degree of lead-time as many of these assets are not on stand-by, but require contributions from member states and the delegation of relevant personnel from their normal line of work. Warning intelligence that arrives too late for such key instruments, will have lost some or all of their usefulness to decision-makers. The need for timeliness applies most strongly in cases involving fast-paced developments. Strong liaisons between intelligence and policy departments can enable the former to provide relevant and timely intelligence to inform policy decisions. 49 The greatest challenge here is a rigidity of information processing lines, which can slow down or hinder the formulation of policies that correspond to the new realities on the ground.

Convincingness
One of the less examined aspects of the strategic surprise literature is the need for estimative intelligence in general and warnings in particular to be communicated in a way that is likely to be understood and believed by the decision-makers it is addressed at. The ability to convince arises from a combination of factors such as clarity, specificity, fear appeal, authoritativeness, and credibility of the source, and more generally, the degree to which intelligence is successfully tailored to the 'consumer' in terms of content, evidence used, timing of delivery, channel, format, and actionability. 50 The literature does recognise that analytical judgements need to be sufficiently clear regarding their meaning and importance. 'The absence of clarity,' wrote Handel, may 'strengthen the tendency of some statesmen to become their own intelligence officer.' 51 The intelligence product thus should have an appropriate form and length, which is digestible for senior decision-makers who are notoriously short of time. While in the absence of certainty, cautious warnings are still better than no warnings, the ultimate goal is to provide specific, clear and reliable answers to the 'w-questions'. Intelligence analysts should clearly indicate how confident they are in their judgement, what is known and what is unknown, and what the analysts have inferred. 52 They should not 'purchase' greater persuasiveness through exaggerated confidence along the lines of the notorious statement of former CIA Director George Tenet, who told President Bush there was a 'slam dunk case' that dictator Saddam Hussein had unconventional weapons. 53 They should be as specific as possible about the probabilities underlying analytical judgements. We know from past cases such as the Bay of Pigs invasion that decision-makers can easily misinterpret qualitative terms such a 'fair chance' as success being 'likely' rather than having odds of merely 1/3. 54 Gaps between assessments and the knowledge on which they are based should continuously be made explicit. 55 Effective communication requires an analyst to have a good understanding of their country's foreign policy and its overall priorities in a given region. This 'common frame' specifies vulnerabilities and prescribes ways of recognizing relevant developments. 56 In addition, the most useful intelligence is based on a high awareness of the pre-existing levels of knowledge, worldviews, hot-buttons, agendas, and information processing habits of key decision-makers. Furthermore, the more actionable the intelligence is, the more likely it is to be listened to and acted upon. 57

Due attention and priority
Postmortem exercises need to take capacity restrictions of foreign policy systems seriously when judging whether decision-makers have paid enough attention to estimative or warning intelligence. There is always more analysis than decision-makers have the cognitive resources to pay attention to and understand. Similarly, there are always more potential problems than states and international organisations have the capacity to prevent, mitigate or prepare for. The bottleneck problem of a small number of senior decision-makers sitting at the top of pyramid faced with a massive inbox of intelligence assessments is particularly pronounced in the case of the United States with its huge and complex intelligence community, but the generic problem is the same for other actors, except that in Europe there may be multiple relevant decision-makers at different levels. Moreover, decisionmakers may be at least partially forgiven for prioritising more immediate and certain problems and crises over more distant and uncertain ones. However, this does not mean giving a carte blanche to decision-makers to simply ignore, delay and deprioritise dealing with intelligence assessments and warnings that do meet key criteria of objective quality in the terms as discussed above, i.e., which are persuasive, timely, and clearly consequential in terms of their implications to citizen interests. One should particularly look closely at how and when decision-makers engage with estimative intelligence that already comes with a high degree of priority from within the intelligence community or is endorsed by a wide-range of analysts and senior officials. For instance, intelligence delivered by personal briefers to senior decision-makers or distilled into key documents, presented at regular decision-making moments in the life-cycle of early warning and risk monitoring exercises, or being presented in response to a demand by decision-makers or bottom-up as a highly resource intensive exercise of analysis involving experts from across government. One could also consider the degree of consensus among analysts over the analytical judgement and the consequences that are being highlighted to judge whether a particular intelligence estimate should have been given more or less attention and priority.

Openness to discordant and potentially inconvenient claims
The openness to having one's existing beliefs challenged is widely recognised in the literature as a virtue of intelligence analysis in general and estimative intelligence about surprising and potentially threatening futures in particular. The lack of such openness and challenges as well as the influence of an excessive culture of consensus are also frequently noted vices in postmortems. As Jervis wrote about the case of Iraq, once the view was established that the country was producing weapons of mass destruction 'there not only were few incentives to challenge it, but each person who held this view undoubtedly drew greater confidence from the fact that it was universally shared.' 58 This confirmation bias as well as the failure to examine alternative hypotheses can equally exist at the decision-maker level. Risks of group think in the decision-making process occur when members do not express deviating opinions, challenge assumptions, or suggest new ideas that others may disagree with. We know from research that some senior decision-makers, such as the former EU High Representative Javier Solana, actively seek out and encourage alternative views from civil servants and experts, while others avoid such views through various filtering and bypassing mechanisms such as close advisors and separate committees, strongly push-back against discordant evidence when confronted with it, or use their power to intimidate, exclude and repress sources with politically inconvenient advice. While the politicisation of intelligence can be inevitable for issues that are made salient through the news media or become subject of political contestation, it matters greatly how both analysts and decision-makers deal with such politicisation and the pressures it brings to reduce politically inconvenient messages, to screen-out complexity and countervailing evidence, and to protect conventional wisdom.

Acceptance of threat analysis
The expectation that decisionmakers should accept intelligence in terms of its knowledge claims about a probable future does not interfere with their political prerogative in decision-making about whether to act on any recommendations. Moreover, there are occasions when decision-makers may have justified confidence that their own analytical judgements are superior to the advice they are getting, for instance, if they themselves have relevant training and experience related to the conflict regions, are able to draw on their own contacts and networks grown over a period of time, or have privileged access to and insights about the thinking of foreign senior decision-makers due to their interactions with them. Naturally, one will need to be very cautious before accepting such explanations for disbelieving or discounting analysis produced through a rigorous process involving authoritative experts from within government as the literature highlights the tendency of politicians to overestimate their own knowledge and analytical acumen. 59 On the other hand, one will need to take into account the strength of the evidence underpinning the intelligence, the track-record and authority of the main source, and other factors to do with the clarity and persuasiveness of the intelligence as discussed previously. Decision-makers are perfectly entitled to reject claims made from a source with known biases or who can be rightly suspected of hidden political biases and motivated by an intent to manipulate. On the other hand, decision-makers should be held to account if they simply disbelieve high-quality intelligence for reasons to do with a lack of motivation to engage with the evidence (laziness), a reluctance to believe because it would amount to an admission of previous failure (denial), and misplaced optimism that harm would be avoided so that strongly held-policy preference can still be achieved (wishful thinking).

Key performance enabling or constraining factors
Assessments of whether or not analysts or decision-makers should have been surprised or performed well in a case depends on interactions between these actors and they thus should not be judged in isolation from each other's performance. 60 At the same time, the performance of both analysts and decisions-makers needs contextualisation as there can be aggravating or mitigating reasons for being caught by surprise. We propose that the three most common factors are case-specific diagnostic challenges, pre-existing relevant capacities for knowledge production and the influence of the prevailing political environment at the time. First, the ability of knowledge producers to anticipate or predict certain events from happening is constrained by the diagnostic difficulty of the case at hand, which itself is a function of several factors. One crucial factor is the degree of discontinuity the event poses with the status-quo ex ante. To what extent does the event break with the patterns of the past? Humans tend to take a linear view of the future, perceiving it as an extrapolation of present trends. And while most future occurrences can indeed be extrapolated from the present, for example in the field of climate change, in the uncertain international system, events with most impact are often non-linear, black swans, or bolts from the blue. 61 The search for such events poses diagnostic challenges to the intelligence analyst as envisioning a multiplicity of futures, their possible consequences, and their threat levels requires often costly out-of-the-box thinking. And even then, prediction is not always certain. The massive ripple effects the self-immolation of Mohamed Boazizi on 17 December 2010 has had across the Middle East, causing the Arab uprisings in multiple countries and even leading to civil war, led to a series of events impossible to forecast, even though experts knew for years that countries like Tunisia and Egypt were powder kegs. 62 Cognitive simplificationi.e., making sense of complex patterns by simplifying and filtering them into familiar frames, patterns, and storiesexacerbates this difficulty of anticipating large breaks with the status quo. 63 Surprisesensitive forecasts of emerging threats are very resource-intensive as it requires manpower to distinguish weak signals of change from the noise of masses of routine reporting and data. Another objective diagnostic difficulty arises from the complexity and speed of threat dynamics as well as difficulties of obtaining information about geographically remote and underdeveloped countries or regions. For instance, one of the most surprising features of the genocide in Rwanda was the sheer speed of the killing combined with the challenge of getting reliable information from remote parts of the country. 64 In the case of Ukraine, it was difficult to gauge the military situation on the ground in Crimea in February and March 2014, or to even identify an unambiguous casus belli. 65 The combination of deception and the spread of false information surrounding the Crimea invasion meant that the factual threshold evidence of Russian actions in Crimea had to cross was high. 66 Although deception is as old as warfare itself, 67 Russia's ability to 'merge the overt and the covert' in combination with its so-called 'information operations' underlines how modern threats have changed. 68 This ties into another complicating factor, namely the degree of credibility of both sources and experts when flagging threats, which is affected by previous communications. Georgia faced this problem in 2008, when Western governments were hesitant taking its account of Russian intentions at face value due to the Georgian reputation for 'crying wolf'. 69 Finally, the diagnostic difficulty of a case is impacted by the degree to which the situation cuts across various areas of technical and geographic expertise, requiring a combination of knowledge from different bodies in an organization.
The second category of enabling or constraining factors concerns pre-existing knowledge and policy response capacities to anticipate and potentially respond to this kind of threat. It concerns the capacity of and processes in intelligence production. Regarding capacity, the question is whether the resources and expertise available was sufficient to engage in high quality information gathering and analysis for the case at hand, for instance, were there relevant country experts or previous country or case experience of senior decision-makers. After the Ukraine crisis, many European countries for example criticized the steep decline in Russian speakers and Russia experts after the end of the Cold War. Equally important is access to relevant intelligence (e.g., satellite images, human intelligence from the inner circle of the conflict party, up-to-date and reliable information from local embassies, field missions, social media or envoys). Former CIA deputy director Michael Morell described how the CIA was surprised by the Arab uprisings precisely because it did not access the right sources of intelligence: 'We failed because to a large extent we were relying on a handful of strong leaders in the countries of concern to help us understand what was going on in the Arab street. We were lax in creating our own windows [. . .] the intelligence community was not doing enough to mine the wealth of information available through social media.' 70 Regarding process, an important question is whether there were well-established assessment procedures for the kinds of risks or threat at hand and whether the processes followed good practice in terms of monitoring in reviewing. Another important question regarding knowledge and policy response capacities is whether intelligence producers had access to relevant instruments or resources that could be mobilised at sufficient speed to address the problem if timely intelligence had been provided.
A third and final factor is the prevailing political environment which may greatly impact both intelligence production and receptivity to it. Power transitions in government or major personnel changes in intelligence production have the potential to greatly distract both decision-makers and analysis. Additionally, decision-makers routinely deal with a number of crises and issues simultaneously. The Ukraine crisis, for example, played out when EU capitals were already being overwhelmed by the global financial crisis, the eurozone crisis, and the aftermath of the Arab uprisings. Such agenda competition may impact receptivity to intelligence. What is more, decision-makers are, like the intelligence analysts, human beings that struggle cognitively with entertaining multiple hypotheses and scenarios at the same time. 71 Meanwhile, despite the fact that both decision-makers and analysts may be to some extent distracted by either foreign or domestic crises, we should expect a degree of ringfencing of resources for anticipating and responding to upstream problems.
More importantly, decision-makers often stand to gain or lose politically from the way a threat or crisis is perceived and framed. Scholars have asserted that after a strategic surprise, political leaders are more likely to attribute events to domestic failings than to the deception or secrecy of the adversary. 72 Warnings and threats can be framed by politicians in a process in which blame and responsibility are attributed with the aim of political gain. Moreover, warnings about a certain threat may thus become subject of intense politicisation within the government or between government and opposition. Political conflict and contestation thus can impact receptivity to intelligence, but equally the way the intelligence is interpreted and used.

Conclusion
We have argued that the strategic surprise literature could benefit from grounding postmortems in broader normative conceptions of the role of experts and knowledge in public policy-making and as well as from showing greater sensitivity to the European setting, which differs in a number of important aspects from the US intelligence-policy nexus. Our argument is that a postmortem framework that works for both EU institutions as well as European member states has wider applicability to other national contexts than one which is modelled, implicitly or explicitly, on studies of a country that is obviously important but also in many respects atypical. We proposed to broaden our conceptualisation of surprises beyond sudden attacks to include slower-burning, nonkinetic phenomena like the rise of the Islamic State, the Ukraine crisis or the Arab uprisings. The new taxonomy of surprise in foreign policy is more fine-grained than existing accounts in order to better elucidate the different ways officials or organisations may experience a strategic surprise. By pinpointing more precisely who was surprised, in what way, our framework can help to investigate more accurately the potential causes of surprise and allows observers to cast more accurate normative judgements on whether the surprises could have been avoided and if so, in what way. This feeds into the overarching aim of evaluating how the foreign policy process handled these surprises taking key attenuating factors seriously. These include diagnostic difficulties specific to each country or conflict case, the prevailing political conditions at the time, as well as pre-existing resources related to particular countries or threats. Some of the performance criteria relate primarily to intelligence producers, while others mainly apply to decision-makers or the organisational space between-them, which shapes what kind of interaction and relationships are possible. We hope this framework enables scholars and practitioners conducting postmortems to separate questions of case-specific intelligence performance, from the broader analysis of what cognitive, organisational or procedural factors produced that performance. It enables us to ask 'what if' questions that elucidate which changes are likely to improve intelligence assessments in the future for similar but not identical cases.