Human-machine collaboration in intelligence analysis: An expert evaluation ☆

In this paper we illustrate how novel AI methods can improve the performance of intelligence analysts. These analysts aim to make sense of — often conflicting or incomplete — information, weighing up competing hypotheses which serve to explain an observed situation. Analysts have access to numerous visual analytic tools which support the temporal and/or conceptual structuring of information and collection, and support the evaluation of alternative hypotheses. We believe, however, that there are currently no tools or methods which allow analysts to combine the recording and interpretation of information, and that there is little understanding about how software tools can facilitate the hypothesis formation process. Following the identification of these requirements, we developed the CISpaces (Collaborative Intelligence Spaces) decision support tool in collaboration with professional intelligence analysts. CISpaces combines multiple AI-based methods including argumentation theory, crowdsourced Bayesian analysis, and provenance recording. We show that CISpaces is able to provide support to analysts by facilitating the interpretation of different types of evidence through argumentation-based reasoning, provenance analysis and crowdsourcing. We undertook an experimental analysis with intelligence analysts which highlights three key points. (1) The novel, principled AI methods implemented in CISpaces advance performance in intelligence analysis. (2) While designed as a research prototype, analysts benchmarked it against their existing software tools, and we provide results suggesting intention to adopt CISpaces in analysts ’ daily activities. (3) Finally, the evaluation highlights some drawbacks in CISpaces. However, these are not due to the technologies underpinning the tool, but rather in its lack of integration with existing organisational standards regarding input and output formats. Our evaluation with intelligence analysts therefore demonstrates the potential impact that an integrated tool building on state-of-the-art AI techniques can have on the process of understanding complex situations, and on how such a tool can help focus human effort on identifying more credible interpretations of evidence.


Introduction
An intelligence analyst's job is to construct coherent hypotheses despite significant gaps and inconsistencies in gathered evidence, and present them clearly to decision-makers to inform their interventions (Heuer, 1999).
Current automated systems to support the day-to-day practice of analysts are (almost) exclusively focussed on two aspects of the problem.The first is data collection, aggregation and visualisation (IBM, 2017;Wright et al., 2006).Such tools help analysts collate, inspect and interact with a large dataset, and support the identification of relationships, for example through link analysis (Prunckun, 2010).Recently, crowdsourcing tools that enable the public to contribute information have also been introduced to integrate more traditional intelligence collection approaches (Stottlemyre, 2015).The second problem on which tools focus involves listing and weighing up alternative hypotheses (Heuer, 1999), through automated analysis of competing hypotheses (Burton and Knowles, 2010;Schrag et al., 2016;Stefik, 2014;Tecuci et al., 2010).This analysis requires that all alternative hypotheses be identified from available evidence, and, if aided by automated inferential reasoners such as Bayesian networks (Schrag et al., 2016), the tools also require that each (aggregated) piece of evidence is given a weight or a degree of certainty.
We observe, however, that there is a gap in technology that supports the process that analysts perform after inspecting the data, and before the identification of hypotheses.The task of the analyst here involves the structuring of evidence in a consistent manner to select plausible hypotheses.This is currently done manually, supported only by generic spreadsheet and text-processing tools.The challengewhich we seek to address in this researchis to understand how automated reasoning can best complement human expertise in this evidential reasoning process.
Experienced analysts currently identify plausible hypotheses using a combination of manual approaches to assess available evidence, establish what information is credible, and understand what additional evidence may be required or what questions to ask to determine plausibility.This activity may be time critical so as to enable effective situational understanding, which poses significant challenges for individual analysts.The volume and variety of information that analysts must consider is significant, and, evidence may be unreliable or conflicting, with important information missing.Collaboration may be used to provide peer-review, sharing the burden of analysis and helping in the validation of conclusions (Heuer, 1999).Such collaboration, however, requires analysts to work with a common model and a consistent world-view, which is hard to achieve in the real world.
When data is diverse and comes from different sources, analysts must reason about the reliability of the evidence leading to claims from information such as how, where, when and by whom the evidence was gathered and analysed.Cognitive biases may inadvertently be introduced in the process, preventing an analyst from drawing accurate conclusions.This process of interpreting evidence relies heavily on the expertise and training of analysts, and there is a distinct lack of methods to ease the high cognitive burden involved in forming hypotheses.Furthermore, there is a general lack of understanding of how the hypothesis formation process works, as it is not normally recorded, making it difficult for senior analysts to pass on their analytical skills to trainees.The analytical process is also resistant to automation due to the significant knowledge engineering effort required to process data and express reasoning patterns (Llinas, 2013).
In this paper, we illustrate how novel AI methods, based on a combination of argumentation theory, crowdsourcing and provenance reasoning, can contribute to improved performance in intelligence analysis.While existing systems care mostly about information presentation and collection (Billman et al., 2006;Burton and Knowles, 2010;IBM, 2017;Wright et al., 2006), we co-designed our software tool -Collaborative Intelligence Spaces (CISpaces) -with intelligence analysts to focus on the sensemaking activities around forming hypotheses from available evidence using patterns of defeasible inferences, or argumentation schemes (Walton et al., 2008).Our formal evaluation of CISpaces using the Technology Acceptance Model (TAM) (Davis, 1989;Venkatesh and Bala, 2008;Venkatesh and Davis, 2000) provides evidence that intelligence analysts benefit from the support they receive from the tool in their sensemaking activities.The contributions of this paper are thus many fold: • In Section 3, we provide a blueprint for further co-design of artificial intelligence driven tools, by showing how to govern the process for a successful outcome.• In Section 4, we expand on our preliminary conference paper (Toniolo et al., 2015) to illustrate the delicate interconnection between the various artificial intelligence techniques utilised and extended to achieve the co-designed objectives.In particular: • we advance the engineering of argumentation-based reasoning (Prakken, 2010) to identify plausible hypotheses as sets of acceptable arguments; • we show how to argue with, and about, crowd-sourced information (Brabham, 2008;Kamar et al., 2012;Whitehill et al., 2009) pre-analysed using Bayesian analysis; • we embed provenance analysis (Hartig and Zhao, 2009) in the argumentative process to establish the credibility of hypotheses.• In Section 5,we provide empirical evidence that intelligence analysts benefit from the unique mixture of formal argumentation theory, crowdsourcing support, and provenance recording provided in CISpaces, for the first time, using the Technology Acceptance Model (TAM) (Davis, 1989;Venkatesh and Bala, 2008;Venkatesh and Davis, 2000) in an argumentation-based system.

Fig. 1.
A visualisation of Intelligence Analysis Issues and Requirements extending Pirolli and Card (2005).
A. Toniolo et al.Our results suggest that the novel, principled AI methods implemented in CISpaces may advance performance in intelligence analysis.Despite having designed CISpaces as a basic research prototype (Technology Readiness , during their evaluation, analysts benchmarked the quality of its features against commercial systems they use everyday: we compare against them in Section 6. We collected evidence suggesting that the AI methods implemented in CISpaces can have a behavioural effect on the intention to adopt CISpaces by end users.The analysts' evaluation highlights drawbacks in CISpaces that predominantly result from the interface between the tool and data sources and from aspects of the user interface (rather than the underlying AI methods).We, therefore, conclude that for successful adoption by the intelligence analysis community, CISpaces will need data integration with existing organisational standards both for the input and the output of information.These and other engineering and usability aspects, while being essential for commercialisation, are beyond the scope of this paper.

Challenges of intelligence analysis
Intelligence analysis is the application of individual and collective cognitive methods to evaluate, integrate, and interpret information about situations and events, aiming to provide warning regarding potential threats or to identify opportunities (Heuer, 1999).Various types of intelligence can be distinguished based on source.HUMINT (human intelligence), for example, is intelligence gathered from human sources.IMINT (imagery intelligence) is derived from image or video sources.OSINT (open source intelligence) is acquired from sources such as social media (Prunckun, 2010) and more recent types of intelligence include for example crowdsourced intelligence, made up of structured information acquired from or volunteered by the general public (Stottlemyre, 2015).Analysts often specialise in a specific type of intelligence, and may be focused on particular objectives (e.g., tracking activities of a criminal organisation).In the military context, strategic analysts focus on studying long term objectives and intentions of foreign actors, while operational and tactical analysts are focused on supporting specific actions or providing timely responses to emerging situations.The examples used in this paper focus primarily on the operational and tactical analysis of HUMINT, although related work has also considered field intelligence (Toniolo et al., 2016) and OSINT (Cerutti et al., 2018b).
To conceptualise the process, Pirolli and Card's (2005) model of intelligence analysis is one of the most influential in training and practice.It consists of two high-level iterative loops: foraging for information which is collected, filtered and collated into evidence files; and sensemaking, where the evidence files are interpreted through logical reasoning by drawing inferences and identifying hypotheses, which are then brought together to form a coherent explanation of the situation.A sketch of this process is shown within the box at the top of Fig. 1.The top row represents general features that characterise or influence the reasoning process during analysis.The other rows in this figure represent concepts related to different dimensions of intelligence analysis corresponding to a specific phase of analysis represented by the curly bracket in the column.More generally, this figure provides a reference for the components and challenges which inform the remainder of our discussion, and are ordered according to the topics covered in this Section.
Hypotheses formation and automation.While automation has been deployed in previous research for information collection (IBM, 2017;Wright et al., 2006) and evaluation of hypotheses (Billman et al., 2006;Burton and Knowles, 2010;Stefik, 2014), the analysis of information to form hypotheses remains a mostly human-driven task.This is difficult to fully automate due to the significant effort required to engineer both the explicitly available data for a situation, and the implicit knowledge that analysts use to form a hypothesis (Waltz, 2003).Heuer (1999) argues that these hypotheses are often created by analysts by adopting a historian approach to reconstruct a narrative looking at the available information to explain events.Recent experiments confirmed the use of narratives for collaborative analysis (Saletta et al., 2020).This approach is also advocated by Bex and Verheij (2012) in formalising reasoning with respect to a criminal investigation, where the evidence is used to establish the plausibility of an existing story.
The output of an analysis normally consists of an intelligence report that presents the most plausible hypothesis.Many hypotheses, however, are considered during the process of making sense of the situation.For example, Klein et al. (2006), while modelling the cognitive behaviour of analysts argue that "people don't engage in simple mental operations of confirming or disconfirming a hypothesis."They propose a model in which data are interpreted according to a frame (such as a story, a diagram or a map) that is questioned and reframed as new information and links are formed.Their observations indicate that experts consider multiple, competing frames while making sense of events to establish those most accurate.More recently, Baber et al. (2016) used this framework to better understand the use of frames in the context of intelligence analysis.Two groups of participants, professional analysts and students, were observed and compared during an exercise to identify suspects of criminal activities.They concluded that "tools that support the collation of information...might help with down-collection of data but do not provide support for conflict and corroboration or for hypothesis-exploration" where down-collection means sampling the available data for material deemed to be relevant to the analysis (Baber et al., 2016).This is attributed to differences in how frames or representations are constructed, used and shared depending on participants' expertise.We note that there is little prior research, however, on how tools can facilitate the externalisation of an analyst's reasoning process while forming frames or hypotheses.
Analysis Issues.One of the main challenges to such facilitation is the need to identify factors that contribute or hamper the externalisation of the reasoning process.Of primary importance among these factors is the role of evidence.Ambiguous, conflicting, unreliable or incomplete evidence might lead to multiple alternative hypothetical explanations of a situation, and such evidence in turn may be used to evaluate the strength of the different hypotheses due to the cyclic nature of hypotheses formation and evaluation.
Evidence may be ambiguous or conflicting for a variety of reasons.Heuer (1999) claims that most human-sourced information is second-hand at best, and furthermore this might be reported by sources that have varying degrees of trustworthiness.Information might have been purposely manipulated or simply reported by several sources from alternative points of view.Evaluating the provenance of this information is fundamental to forming a more objective assessment (Toniolo et al., 2014).When information is sought specifically to answer questions and requirements, such as for example through crowdsourcing (Stottlemyre, 2015), its quality varies significantly and analysts must employ methods to aggregate results and establish the truthfulness of this evidence.In addition, the analysis might be hampered by biases, such as confirmation bias, whereby an analyst (often unconsciously) prioritises information that confirms current beliefs, which may affect the accuracy of conclusions.Other biases are also associated with human working memory and its difficulties in remembering all the underlying reasons for an explanation, as well as the difficulties in revising links that have already been made on the basis of new information and its credibility (Heuer, 1999).
Analytic tools.Intelligence analysts are specifically trained in developing critical thinking through a variety of analytic approaches and techniques (Heuer, 1999;Prunckun, 2010;United Nations, 2011;US Army, 2006), to address the challenging process of identification of evidence and formation of hypotheses.These approaches are derived from logical and statistical models of reasoning, and are concerned with both analysing and interpreting data, and understanding and avoiding biases of different sorts.Examples include: link analysis that aims at identifying relationships between entities, resources, and events; red-team versus blue-team exercises where teams play the roles of attacker (red) and defender (blue); and SWOT (Strength, Weaknesses, Opportunities and Threats) analyses.The resulting observations from these approaches are patterns and relationships that link information; such patterns vary significantly, however, they share similarities in that logical inferences are made among events (in line with the historian view) and other elements of analysis related to these events such as entities, resources, indicators, etc.These patterns constitute inferences that are fundamental to hypotheses formation, however, the reasoning step that leads an analyst to make the observations and then construct a hypothesis is primarily manual and often an internal process, and as such unrecorded.
Hypotheses Assessment.Once hypotheses are identified, and in order to evaluate these against evidence, the Analysis of Competing Hypotheses (ACH) (Heuer, 1999) approach is considered fundamental among the analytic methods employed in both training and practice.This approach aims to provide a reliable evaluation of hypotheses and support the mitigation of reasoning biases.The application of ACH to a problem promotes a systematic and objective approach, which aids in the management of complexities inherent in real-world scenarios.A table is used to weigh alternative explanations by asserting whether there is some evidence in support of a hypothesis.Typically, ACH is performed manually, and so demands significant cognitive effort: the analyst is required to retain multiple hypotheses in memory together with the evidence acquired to support these hypotheses.The complexity of this process leads to a high risk of erroneous assessment of the plausibility of hypotheses, which can have substantial effects on the quality of an analysis.For example, the report of the Iraq inquiry (Chilcot, 2016), when referring to whether Iraq possessed weapons of mass destruction (WMD) in 2003 states that "Intelligence and assessments were used to prepare material to be used to support Government statements in a way which conveyed certainty without acknowledging the limitations of the intelligence" and that "The question is whether, in doing so, they conveyed more certainty and knowledge than was justified".The effect of the errors made in the assessment of plausibility of this hypothesis has been so significant that it is now a textbook exercise when training intelligence analysts (Lahneman and Arcos, 2014).
Collaboration.In addition to the challenges listed above, analysis of complex, real-world situations is typically team-based within or across agencies (Kang and Stasko, 2011).Collaboration brings advantages in providing diverse and complimentary perspectives and expertise, and mitigating biases.The differences between groups of analysts in expertise, resources, and capabilities may, however, lead to conflicts of opinions with regards to what hypotheses are plausible.While out of scope of the current work, we note that issues including policies restricting information sharing may also come into play (Verma, 2010).Externalisation of the reasoning process is key in collaborative analysis, and one of the most critical problems in supporting collaboration is to identify what part of the process should be externalised or shared with other collaborators (Mahyar and Tory, 2014).Existing models focus on information sharing and functional collaboration (Kang and Stasko, 2011).Information sharing in this context is the process of identifying information that may be of interest to others and sharing that which is relevant.This is very much an activity which occurs early on in the foraging loop, and is concerned with collecting and filtering information.Functional collaboration in this context is the process of editing reports to complete an analysis, and hence telling a combined story.These collaborative tasks are at either end of the Pirolli and Card conceptual model (Pirolli and Card, 2005).An additional type of collaboration, referred to as the content level collaboration (Kang and Stasko, 2011), sees analysts work together to structure and link information and evidence.For analytic problems to be addressed in time-stressed contexts, analysts from different organisations with different expertise, perspectives and resources need to work together to form these hypotheses within a common framework.Despite limited prior research on how content-level collaboration can be supported, this type of collaboration would be most beneficial as it would permit the elaboration and sharing of alternative hypotheses across a team, enabling more effective criticism and robust reasoning.
From the challenges highlighted within intelligence analysis, we must distil the most important and unaddressed issues.This is the focus of the next section, where we consider and prioritise the most important analyst requirements which have not yet been adequately addressed by existing tools.

Co-design with analysts
In this research, we study how novel AI methods advance performance in intelligence analysis by providing analysts with computational support in externalising their reasoning while building and comparing alternative hypotheses from interpretation of reports, observations, inferences, or crowdsourced data (potentially with various degrees of reliability) and conflicting or corroborating evidence.Addressing this challenge by providing meaningful computational support will bring advantages both for individual analysts and analysis teams.At the individual level, there is potential to provide automation in checking the validity of the reasoning process, support the understanding of where the information has come from, and to focus reasoning on important areas of uncertainty.At the team or collaborative level, advantages may manifest in improved mutual understanding of how other analysts reason, and may help elucidate others' opinions of how evidence links together to form hypotheses.
Our objectives are, therefore: • OJ1 -support the identification of what is believed to have happened from the evidence gathered (the sensemaking process); • OJ2 -support the integration of different forms of explicitly requested information (the crowdsourcing process); • OJ3 -support the assessment of information credibility according to the history of its collection and manipulation (the provenance reasoning process).
For the support provided to these underpinning processes to be effective, we aim to develop a principled approach that aligns with analysts' methods for analysis.
Identifying these research objectives is not sufficient.If we are to develop interventions that have the potential to be acceptable to and adopted by practitioners, we require a deeper understanding of the process of intelligence analysis as experienced by experts as well as their requirements and priorities.To achieve this, we closely collaborated with various groups of expert analysts in the US and UK to validate our objectives, co-develop a system to enact these objectives as well as a scenario to help understand the kind of tasks that analysts need support for, and finally to evaluate our system.
In the next subsections, we present the results of our work in exploring requirements in collaboration with experts to prepare the development of CISpaces.In particular, Section 3.1 discusses results from a focus group, Section 3.2 presents an intelligence analysis scenario which will be used throughout this paper, and Section 3.3 summaries the requirements presented in this section and emerging from our review in Section 2.

Requirement validation: Focus group
We now introduce the results of a focus group conducted to better understand the analysis process and refine and contextualise our objectives in line with analysts' experience.This focus group involved five experienced analysts from an international agency who consented to participate in this academic study and have their opinion analysed and reported in publications.Although this group was relatively small, participants brought extensive expertise to the study, and the discussion lasted for around three hours.The questions were exploratory in nature, A. Toniolo et al. but focused on four main topics: • Inputsthe types of information collected, and issues of quality and credibility assessment.• Reportingoutputs and conclusions delivered to decision-makers.
• Processhow analysts work in their daily activities.
• Peoplemodes of collaboration among analysts.
The list of guiding questions is provided in Appendix D.1.The focus group discussion was recorded, transcribed and coded and we report here on insights gained that relate to our objectives (OJ1-OJ3). 1Three main themes emerged: analysts' attitudes towards the analysis, quality of the analysis, and features and tools characterising the analytic process.In Figs. 2 and 3 we summarise the results of the coding using an adapted Sankey diagram.For each theme, the graphs highlight the most relevant concepts (right-hand axis) according to the four topics (lefthand axis).Note that in a Sankey diagram the width of links are proportional to the strengths of the associations according to the experimental data.

Analysts' attitude towards the analysis
Analysts, by training, are skeptical, and aware of potential biases.As indicated in Fig. 2(a), skepticism is maintained both for information received and during the process.Analysts invest significant effort in strategies to avoid biases, and are open to challenge, in particular through peer review.This highlights one important aspect of collaboration, which is aimed at the review of others' assessments during the analysis, as well as at the reporting phase.Peer review activities include, for example, the red-team versus blue-team technique (Section 2) and Fig. 2. Theme analysis grouped by topics on the left-hand axis: (a) Analysts' attitudes towards analysis; and (b) Quality features of analysis.Fig. 3. Theme analysis grouped by topics on the left-hand axis: (c) Features of analytical process; and (d) Tool objectives and requirements. 1NVivo was used for the analysis (QSR International, 1999).
A. Toniolo et al. seem to align to a devil's advocate type of dialogue, where a proponent proposes a specific position, and others seek to contradict it, in a process of evaluation and progressive elimination of alternatives.This is then reflected in the reporting, where the main hypothesis is elaborated, but then strengthened by the alternative hypotheses that have been discounted.

Quality of the analysis
Analyses have specific characteristics that ensure their quality (see Fig. 2(b)): • Factual and Precisegiven the aim is to increase situational awareness, explanations of what is happening must be grounded on facts.This is important with respect to inputs, the process and particularly in reporting.Access to primary source information was considered important in making precise assessments, but this is often difficult as analysts often have to rely on second-hand information.• Reliable or Uncertainthe likelihood of a situation guides the prioritisation of assessment.Information is assessed on the basis of its likelihood and source reliability.In reports, information, hypotheses and their assessment are often characterised by specific wording expressing their likelihood.
• Timelytimeliness of information plays a crucial role in understanding a situation.Time of generation and time of receipt determine the cut off date for information when conducting derivative analyses.Different types of analysis at strategic, operational or tactical level have different requirements.

Features and tools characterising the analytical process
The focus group also explored attitudes around features and tools characterising the analytical process (Fig. 3).This exploration focused on understanding the features of, and requirements for, support tools.Fig. 3(c) indicates strong reference to both informal argumentation and provenance.
The analysis of the transcript showed strongly positive attitudes towards the argumentative nature of the analysis process.We note that analysts are, by training, exposed to concepts such as informal argumentation and mind mapping as methods to organise, structure and assess the quality of the intelligence reports prepared.This assessment is based on supporting evidence and facts, where no hypothesis is considered wrong but can be rebutted by alternative hypotheses or facts through personal evaluation or peer review.Argumentation also appears important in reporting, where arguments both for the main hypothesis and against alternatives considered play a role.In this work, we leverage the analysts' familiarity with informal argumentation to apply computational argumentation, as discussed in Section 4.
The provenance of information and the analysis are both crucial features for reliable assessment, particularly when this is derivative.Recording how an analysis was formed and how source reliability is considered was viewed as having potential for training.
The discussion around the potential for, and use of tools to support the analytical process drew heavily on participants' prior experience, but as indicated in Fig. 3(d) tools are viewed as important across all our four main topics (inputs, reporting, process and people).Areas of specific interest for future tools included support for organising and improving integration of input information; collaborating with other analysts for peer review and training; the process of assessment and report preparation.Fig. 3(b) summarises a small number of key concerns for novel tools; by far the most important being to save analysts' time.
The findings from our focus group exercise validates our objectives: analysts appreciate support in their sensemaking activities; value timely data; and treasure provenance of information.

An illustrative scenario
A realistic scenario for use in better understanding the demands and requirements of the analysis process, for demonstrating our approach and for evaluation was co-created with the help of two US expert analysts.
This scenario centres on an investigation into possible water contamination in and around the fictional city of Kish.Reports from rural areas indicate an unidentified illness affecting livestock, and, from the city, an increase in patients reporting common symptoms.Analysts identify contamination of drinking water as a possible cause.An intelligence requirement is issued to determine whether this is accidental, or related to other suspicious activities such as a local pumping station explosion.The event of the explosion requires immediate response to both causes and the potential threats to the population.The analysis team requires support both to understand the evolving situation and to liaise with local authorities to gather further information about the spread of the illness in the region.Fig. 4 provides an overview of the scenario and a timeline of events, reporting on the upper part of the timeline information received, and on the lower part key parts of the analysis.

Intelligence analysis requirements and priorities
In this work, we aim to introduce AI techniques to facilitate analysis in a principled way that aligns with analysts' methods employed in everyday activities.In Section 2, we explored key issues raised in the literature which motivate our objectives, while the focus group complements these objectives by providing further insight in the analysis process and by eliciting priorities.Table 1 provides a summary of requirements and challenges raised in the literature and remarked by the focus group, alongside an indication of how the objectivesnamely the support provided for the sensemaking (OJ1), the crowdsourcing (OJ2), and the provenance (OJ3) reasoning processalign with the challenges identified.The table further organises the challenges following the coding scheme of the focus group.In this research, we mostly focus on the hypotheses formation leaving some of the requirements specifically related to general collection of information and reporting for future research.In the next section, we will discuss how our AI approaches have been designed to address these challenges.

Automated support for intelligence analysis
Grounded in the requirements and priorities elicited from the focus group sessions discussed in Section 3, we developed a tool, Collaborative Intelligence Spaces (CISpaces) (Toniolo et al., 2015), which uses multiple AI techniques to support intelligence analysts in: • evidence-based sensemaking by employing adapted argument schemes (Walton et al., 2008) to guide critical review of evidence, and a tailored model of argumentation-based reasoning (Prakken, 2010) to identify plausible hypotheses (OJ1); • gathering crowd-sourced evidence by interpreting responses to structured requests for information from groups of collectors (Kamar et al., 2012) and feeding the results back into the analysis as arguments (OJ2); and • assessing the provenance of information by inspecting the provenance of information to identify critical meta-data that may inform the credibility of hypotheses (Toniolo et al., 2014) once again by interpreting them as arguments (OJ3).
Support is provided to analysts via an interface with two core components: the InfoBox, where collected information relevant to a task are streamed from external sources, typically from intelligence reports; and an individual WorkBox, the analytical space used in the construction of hypotheses.Each component in the analysis has a provenance chain attached.Different forms of collaboration are supported in CISpaces.Portions of the WorkBox can be shared between analysts via drag-anddrop, enhancing collaboration.An analyst may also canvas groups of contributors via the ReqBox, by creating forms for collecting structured information via crowdsourcing.
Fig. 5 provides a screenshotedited to enhance its readabilityof the system during use within our scenario.On the left, the InfoBox collects relevant information: for the purpose of this paper we assume there is an information stream connected to it providing a stream of information to the analysts.Then analysts can move selected pieces of information from the InfoBox into the WorkBox and use these to build their analysis.Recalling our running scenario from Section 3.2, among many theories, the contamination in Kish could potentially be explained by bacteria in the water supply system, and a local Non-Governmental Organisation (NGO) ran some tests for these.Depending of the type of bacteria, it is known that there might be a reaction with chlorine, used in the water system, which could release explosive gases.This, in turn, could potentially cause an explosion.However, the opposite could also be true: an Improvised Explosive Device (IED) that uses highly-toxic chemicals could have been planted next to the pumping station, leading to poisoning of the local water supplies.In the next subsections, we describe the AI-based support provided to analysts for this ongoing evidence-based sensemaking activity.In particular, in Section 4.1 we discuss how evidence is structured and analysed using argumentation approaches following OJ1.Section 4.2 focuses on the crowdsourcing data collection and analysis following OJ2.Provenance assessment (OJ3) is demonstrated in Section 4.3.In the last part of this section, we provide an overview of how CISpaces was developed (Section 4.4).

Evidence-based sensemaking
In Sections 2 and 3, we highlighted several characteristics of sensemaking: hypotheses are formed by drawing inferences over information, guided by patterns such as from link analysis, where each hypothesis can be considered as a story, underpinned by a sequence of events (the historian approach).There are multiple alternative hypotheses considered during analysis, arising from conflicting information or from alternate events that explain this information.Furthermore, analysts are familiar with informal argumentation, as the principal method to evaluate conclusions on the basis of arguments supported by evidence.CISpaces leverages these characteristics to provides automated reasoning support for analysts in structuring and elaborating individual and collaborative analysis.CISpaces makes extensive use of computational argumentation, in particular by introducing patterns to help draw inferences, and by providing a conceptual and computational framework to guide the identification of justified hypotheses.
A graphical representation of inferences enables analysts to form conclusions on the evidence acquired through the WorkBox in the CISpaces interface, and is based upon other argument mapping tools (Reed and Rowe, 2004;van Gelder, 2007).An inference rule is a set of propositions (p i ) divided into one or more premises that are linked with a conclusion.Premises and Conclusions are represented as either white information or light-blue claim nodes in Fig. 5 (respectively in green and purple in the system, modified here for readability purposes).For instance, to explain the information of illness within the population near the explosion in Kish (p 19 on the right-hand side of Fig. 5), our analyst might identify a premise (claim) that this is caused by the supplies used by the emergency services (p 18 ) forming an inference rule between p 18 and p 19 linked by a green round node called a Pro-link.We will write Pro-links as p 18 → p p 19 within this paper.Note that Pro-links can be annotated within CISpaces to provide additional meta-information about the type of inference between nodes.
Propositions can also be in conflict with other components on the basis of an asymmetric contrariness relation (Prakken, 2010).Conflicting propositions are linked through red round nodes referred to as Con-links.
In the example in Fig. 5, the analyst might question whether the explosion released gas that is causing the illness (p 17 ).Specifically the Con-links capture contradicting relationships between pieces of information: p 17 is the contrary of p 18 , and p 18 is the contrary of p 17 , hence p 17 and p 18 are said to be contradictory as they represent two alternative, mutually exclusive, interpretations of reality (conclusion p 19 ).We will write Con-links as, for example, p 17 → c p 18 .

Patterns of inference for intelligence analysis
The underlying structure used to support analysts in drawing inferences is based on argumentation schemesreasoning patterns that commonly occur in human reasoning and dialogue (Walton et al., 2008).They represent templates for making presumptive inferences formed by premises supporting a conclusion, and by critical questions (CQs) that can be put forward against the applicability of the inference.A commonly used example is the argument from expert opinion, used to describe an assertion warranted by expertise: -Source E is an expert in domain S containing proposition A -E asserts that proposition A is true ⇒ Therefore, A may plausibly be true.
For instance, p 3 ="There is toxic bacteria in the water supply system" might be the result of analysts' speculation (hence being a claim) and subject of further investigation.Such an investigation can be summarised by an argument from expert opinion reporting a test that was conducted by an NGO (Non-Governmental Organisation) laboratory (p 2 ), which points towards expertise, with a second premise relating to the assertion reporting of Toxic bacteria in the Kish water supply (p 1 ).In Fig. 5, the link p 1 , p 2 p → p 3 can be labelled with an argumentation scheme, in this case "LEO," standing for "Link from Expert Opinion."There are, however, no explicit statements about this NGO laboratory having expertise for testing bacteria in water samples, or regarding other issues such as their reliability.To this end, critical questions can then be asked to strengthen (or weaken) arguments instantiated from argumentation schemes.Some of the critical questions relevant to an argument from expert opinion are (Walton et al., 2008): CQEO1: How credible is E as an expert source?CQEO2: Is E an expert in the field that A is in?CQEO3: Is A consistent with the testimony of other experts?
Uniquely in CISpaces, to our knowledge, is how the critical questions are used to drive further analysis.When the analyst selects a question that must be answered for the conclusion to be acceptable, the system generates a negative answer in a new node connected via a Con-link; e. g., for CQEO1 "Source E is not an expert source."This asymmetric conflict prompts the analyst to challenge assumptions that may lead to bias, and requires them to find a reason for source E to be considered an expert, as otherwise the conclusion that there is bacteria in the water supply p 3 would remain unsupported in the evaluation of hypotheses, as discussed later in this section.
To provide analysts with a coherent model, we worked closely with experts to identify the most common argumentation schemes used in intelligence analysis.Analysts are mostly concerned about: Activities (Act) including actions performed by actors, and events happening in the world; Entities (Et) including individuals or groups, and objects such as resources; and Facts (Ft) including statements about the state of the world regarding entities and activities.There are several critical relations among these elements: causal relations representing the distribution of activities, their correlation and (possible) causality; and relations that connect entities and activities through temporal, geographic or thematic associations.Intelligence elements then act as premises for inferences, and conclusions are tentatively drawn by discovering relations among them.In line with the historian approach (see Fig. 1), analysts then use these relations to reconstruct a narrative that explains events forming alternative hypotheses.According to the type of relation (causal or associative) we can now instantiate two main types of schemes for the sensemaking process (cf.Fig. 6).
An argument scheme from cause to effect may be used to provide an explanation for some set of observations on the basis of activities and events that shows how the situation has evolved.This is referred to as an inference link LCE, and considers a cause C (referring to some fact Ft i or activity Act i ), its effect E (also referring to some fact or activity), and a causal rule that links C to E. In Fig. 6 we present LCE, which has been adapted from Walton et al. (2008).In our previous example, as illustrated within Fig. can then be considered instances of a causal argument scheme.Instances of the causal argumentation scheme form a chain of events that constitute the backbone of the hypothesis.These can further be reviewed through critical questions, challenging the instantiation of the causal argumentation scheme (CQCE4); the order of events (CQCE5); and evidence for the premises (CQCE1, CQCE2, CQCE3).Critical question CQCE6 has a different purpose.In CISpaces, analysts are required to represent a cause as a Pro-link to an effect, but analysts may have evidence for the effect and infer a plausible cause using abductive reasoning.In this case, alternative causes must be considered.CQCE6 is used to consider these alternatives by interpreting this question as a rebuttal for C. CQCE6 then results in alternative incoming nodes to the Pro-link representing a contradictory relation between causes.
An argument for identifying an agent from past actions (Walton et al. 2008) (LID, Fig. 6) encodes the sensemaking process that shifts from understanding what happened to understanding what entities were involved and their association with the activity.In this scheme, properties, H, are facts Ft of type "Et i is affected by Act i /Et j " or "Et i is in the location Et j of Act i ".An instantiation of this scheme can, for example, be used to assert that the observed activity, Act i , "An unidentified person (aka Jane Doe) of interest planted an improvised explosive device (IED) at the pumping station (p 6 )" and certain properties (e.g., p 8 ) can be used to draw the conclusion "Jane Doe planted the IED"(p 7 ) through p 6 , p 8 → p p 7 .Associated critical questions CQID1, CQID2 and CQID3 can be used to review the inference by challenging respectively: that the act (planting the IED) has occurred; that entity (Jane Doe) has some property (seen at the location); and that the act requires that property.CQID4 identifies alternative conclusions, while CQID5 challenges the instantiation of the scheme.
The links between these two major schemes for causal and associative relations are primarily forged through questions CQCE1 and CQID1.A response to question CQCE1, for example, may claim that some entity, Et i , was associated with the cause.In this way, LID may be used to Fig. 6.Argumentation schemes and critical questions for intelligence analysis.
A. Toniolo et al. answer a challenge to an instance of LCE.Similarly, an instance of LCE may answer question CQID1, which is concerned with whether some activity happened, linking association back to causality.In addition, different schemes may be used by the analyst to respond to the various critical questions (Walton et al., 2008).Examples include: arguments from the group where properties of a member are applied to an organisation for providing evidence to LID; an argument from analogy reporting a case with similarities to LCE; or an argument from sign to explain that an event is likely to happen if its indicator is verified, following common indicators such as those presented in training manuals (US Army, 2020).

Hypotheses identification
Following our running example, suppose that analysts have prepared the analysis shown in Fig. 5 to identify what the coherent explanations for this situation are, in order to evaluate their hypotheses.
CISpaces provides automated support to identify what claims and pieces of evidence can together form a plausible hypothesis, and what other alternatives exists that are also plausible, by employing computational models of argumentation.In such models, a fundamental concept is that of an inference rule, where a statement (antecedent) becomes a (prima facie) reason to believe another statement (consequent).For instance, "Reports of Toxic bacteria in the Kish water supply" (antecedent) can be seen as a prima facie reason to believe that "There are toxic bacteria in the water supply system" (consequent).In this research, we only make use of a small set of concepts derived from formal argumentation, specifically borrowing from the ASPIC literature (Modgil and Prakken, 2014) (see Appendix A for further details).For example, scholars in this area distinguish between strict and defeasible rules in their approach to formal argumentation and preferences to establish defeats between arguments (Modgil and Prakken, 2014).Analysts by training are familiar with informal argumentation concepts (see Section 3), including premises and conclusions of an argument, supporting and conflicting arguments.In this research, in order to limit the training burden for analysts, we chose concepts which we could align with these informal concepts but limited to a small set which we deemed necessary for representing and evaluating an analysis.We, therefore, will not make use of strict rules or preferences in this work, and we will not discuss those further.
Rules provide the building blocks for the notion of argument, that is iterative in the chaining of rules.Statements that are tentatively assumed to hold provide the base case for such an iteration, and thus they are defined as arguments having the statement itself both as a singular premise and as a conclusion, where such premises and conclusion are two attributes of an argument.The premises of arguments constructed using this base case also take the name of ordinary premises in our approach.As an iterative step, an argument requires the existence of a rule whose antecedents are the conclusions of other arguments (sub-arguments), and as a consequent a statement that forms the conclusion of this new argument.The premises of such a (compound) argument are the union of all the premises of its sub-arguments.A statement is the contrary of another one when they cannot be both accepted, albeit they can both be rejected.Borrowing from the literature, a flexible way for using such a notion of contrariness is by allowing for a statement to be the contrary of another one, while not explicitly requiring the opposite.Two statements which are contrary to each other are said to be contradictory as mentioned above.
The notion of contrariness between statements leads to the concept of defeat between arguments: an argument defeats another argument if the former rebuts or undermines the latter.When the conclusion of an argument contradicts the conclusion of another argument, it is the case that the first rebuts the second, as well as any other compound argument that has the second argument as sub-argument.If, instead, the conclusion of an argument contradicts one of the premises of another one, then the former is said to undermine the latter.Exceptions to the application of a rule of an argument scheme are also considered contrary undermining arguments to implicit premises in this work.
The graphical map of inferences constructed by the analyst, cf.Fig. 5, is transformed into the corresponding premises, contrariness relationships and inference rules as follows: • Premises are considered those propositions that are not conclusions of inferences (incoming Pro-link edges) and constitute part of the knowledge base.• A contrary relationship is added if a Con-link is drawn between two propositions.In addition, critical questions that point towards alternative conclusions are mapped as contradictory relationships.• Pro-links map to inference rules.
In our previous example, the LCE Pro-link p 1 , p 2 → p p 3 represents an inference rule r : p1, p2⇒p3 and gives rise to three arguments: Arg 1 with p 1 both as premise and as conclusion; Arg 2 with p 2 both as premise and as conclusion; and Arg 3 with {p 1 , p 2 } as premises and p 3 as conclusion following the rule r.
On the basis of the asymmetric contrariness relation, an argument can attack another one: if the conclusion of an argument Arg X is the contrary of one of the premises of argument Arg Y , then Arg X undermines Arg Y ; if the conclusion of Arg X and the conclusion of Arg Y are contradictory then the two arguments rebut each other.
Arguments and attacks form a Dung argumentation framework (Dung, 1995) from which sets of acceptable arguments surviving the attack together (extensions) can be computed according to a semantics.In this work we consider the preferred semantics.This semantics selects a maximal set of arguments: maximal with respect to set inclusion extensions that are conflict free (i.e., no arguments in any extension attack each other), and admissible (i.e., each argument in the extension is defended against the attacks it receives).
For the first time in the intelligence analysis and argumentation literatureto our knowledgewe associate each extension to an intelligence analysis hypothesis.The process of intelligence analysis is driven by the identification of hypotheses and a discussion of any potential alternative that may explain the information received about a situation.This is a key concept in the intelligence literature (see Section 2) and has strongly emerged during our focus group (Section 3).Analytic methods such as the red-team versus blue-team or the Analysis of Competing Hypotheses show that this process is embedded in analysts' work (e.g. Heuer, 1999;Klein et al., 2006).The preferred semantics is a multiple-status semantics which provides a set of alternative labellings and, therefore, is well suited to represent alternative explanations for a situation.A set of acceptable arguments identified in the extension permits the extraction of acceptable propositions about events, entities and activities that together are plausible.This is formed by the conclusions of the arguments.We refer to this set as a single, coherent hypothesis or a plausible hypothesis in short.This set is then presented to the analyst using colour coded symbols in the interface.For each hypothesis, green (V) is used to indicate a supported conclusion of an argument belonging to the extension (IN), and red (X) is used to indicate an unsupported conclusion of an argument that is attacked by some argument of the extension (OUT). 2 The formal correspondence between argumentation semantics extensions and identification of alternative hypotheses is discussed in Appendix A.
To recall our previous example, we can extract two hypotheses as shown in Fig. 7, corresponding to the conclusions of arguments accepted by one of three preferred extensions.In addition, unsupported statements for a hypothesis are also highlighted, which in turn might show analysts the consequences of not responding to a critical question (as per our previous example in Section 4.1.1).Assuming that there is no evidence that the NGO is an expert in water contamination (p 2 ) as 2 There is also a third case of undecided conclusions.Further details can be found in Appendix A.
A. Toniolo et al. suggested by CQEO1, this invalidates the first two hypotheses.
To conclude our section on sensemaking, our novel approach includes a tailored set of argumentation schemes for intelligence analysis and an automatic use of critical questions to effectively support analysts in reducing their cognitive biases, leveraging in full the computational argumentation paradigm by creating directed attacks from unanswered critical questions.In addition, with our approach analysts can automatically identify alternative hypotheses, thanks to the correspondence between an extension and a plausible hypothesis.An analyst can more readily observe the effects of adding further information or advancing critiques to parts of the analysis on the set of plausible hypotheses, demonstrating visually the availability of a new hypothesis or the rejection of an unsubstantiated one.

Crowd-sourced evidence
Continuing with our running example, the analyst might require additional location-sensitive information to ascertain whether there is evidence that Kish tap water is contaminated, p 9 .As highlighted by analysts in our interviews (see Section 3), timely information is crucial in this context in particular when the contamination may explain why people are falling ill, as this would allow for more rapid intervention.To do so, analysts could initiate a request for information distributed to the local population to collect evidence about the status of the tap water in Kish.Crowdsourcing uses human computation to sense information and discover truth in a timely, large-scale and cost-efficient manner (Brabham, 2008;Kamar et al., 2012;Whitehill et al., 2009), and it is particularly effective in event detection (Ouyang et al., 2016b).How to interpret and integrate such crowdsourced evidence into an analysis is, however, an open issue.
In CISpaces, we proposed an online method to analyse results of the reports and instantiate them within a novel argumentation scheme which integrates these results into the analysis.More information about the formalism can be found in Appendix B.

Task initialisation
In CISpaces, a crowdsourced query task is initiated by asking specific CQs; e.g. a claim p t may be challenged by the analyst via the question "Is there evidence for p t ?"In our example, this is initiated as "Is there evidence that the tap water in Kish is contaminated"?.We assume that after some time, people in Kish respond to this request by reporting the colour and temperature of their tap water.
In relation to a specific task, we can automatically introduce novel data and inference links in the graph of arguments from a number of questions Q for the crowd, together with associated information enabling data collection and aggregation of results.An example could include the two questions Q = {q 0 , q 1 }: • q 0 : "What is the temperature of your cold water?", of numerical type.
If the temperature reported is < 20 ∘ C the results will provide evidence against the claim p t that the tap water in Kish is contaminated, otherwise the response provides evidence for the claim.• q 1 : "What colour is your tap water?", of categorical type with m possible categories.If the water is Clear or White this would be evidence against the claim, and Brown and Yellow are considered as evidence for the claim.
The task terminates when it reaches a deadline or some pre-specified number of reports are acquired.

Analysis of results
The results are aggregated in different ways depending on the type of data.For categorical data we are interested in knowing the probability of the categories of a multi-valued answer to question q k .Using a Dirichlet prior for this multinomial distribution, the posterior is thus a Dirichlet distribution (Jøsang and Haller, 2007) that combines prior beliefs and collected reports for question q k from which we obtain a vector of expected values ϵ k for the m categories of question q k .The prior used in the simplest case is a uniform distribution over the answers, but a more sophisticated approach would consider crowd features such as reliability and location by manipulating the prior (e.g., Etuk et al., 2013;Ouyang et al., 2016b).For numerical data, we consider a weighted mean μ k of the collected reports for q k where in the simplest case weights are all assumed to be 1, although these may vary according to features of the reports as for the prior probability.
A novel aspect of our approach occurs after aggregating the results for each question q k , when CISpaces uses the task definition to automatically build a partial argument map that is integrated within the overall analysis.The argument from generally accepted opinion (Walton et al., 2008), LCS, represents the defeasible inference that a statement is plausible if a significant majority in a group accepts it.
-Given that the crowd was asked q k and -Answer A is generally accepted as true ⇒ Therefore, A may plausibly be true Critical questions focus on whether the crowd is believable, or corroborating evidence is needed to accept the conclusions: CQCS1: Is the claim A supported by evidence?CQCS2: Is the group in a position to know about q k ?CQCS3: Is the claim consistent with others' claims?CQCS4: Does the group present characteristics appropriate for answering q k ?Fig. 7. CISpaces analysis and evaluation: Three alternative hypotheses.
The system constructs a LCS argument for each question q k where the answer A corresponds to the mean μ k for numerical questions, or for categorical ones the category with maximal expected value ϵ j ∈ ϵ k .Each conclusion either provides evidence for or against the main claim p t .
In our running example, assume we have collected 10 reports for q 0 , q 1 such that: • q 0 : {21, 22, 25, 24, 18, 17, 22, 20, 23, 19} with μ 0 = 21.1 • q 1 : {Clear : 6, Brown : 1, Yellow : 2, White : 1} with corresponding expectation ϵ 1 = (0.542, 0.125, 0.208, 0.125) Fig. 8 illustrates the CISpaces interface for data collection, consisting of the data itself and information such as the location and time of responses.CISpaces allows for inspection of the data before importing, and the imported section of the results (from which arguments are derived) is shown on the right-hand side of the diagram.The LCS argument p 24 , p 25 → p p 26 on the right-hand side states that the temperature of the tap water is 21.1 ∘ , reporting the result of q 0 .The LCS argument for q 1 , p 20 , p 21 → p p 22 reports that the colour of the tap water is Clear.CISpaces also uniquely aggregates the results of the various questions once again in a purely argumentative manner stating that the group provides evidence either for or against the claim, respectively p 27 and p 23 in Fig. 8 providing evidence against the claim.Hence, if all evidence is for a claim, the claim will be accepted (assuming no other arguments exist against the claim), otherwise the claim will not be accepted.In our example, we show however that given that there is not decisive evidence, we currently obtain an inconclusive result, and the two hypotheses are still valid.
To conclude, gathering additional information is necessary to avoid the rejection of hypotheses on the basis of insufficient evidence (Heuer, 1999).Our novel approach to crowdsourcing, evidence interpretation and automated integration of the outcome(s) into an analysis using specifically designed argumentation schemes and procedures, provides an effective method to integrate this form of human intelligence into the sensemaking process.

Provenance
As previously described, each component in the analysis, whether input information or the analysis itself, has a provenance chain attached: data representing the phases of manipulation of that component from its primary sources.In our focus group (Section 3) analysts have highlighted that the origins of information (including information from the crowd), and how and by whom this information is interpreted during analysis are important factors to establish the credibility of hypotheses.Provenance can be used to annotate how, where, when and by whom some information was produced (Moreau and Missier, 2013).Understanding the provenance of information more broadly, however, is fundamental to assessing its credibility.Information may of course come from sources of varying veracity, but it may also have been manipulated or combined with other information before reaching the analyst, and the relative timeliness of information is important for many problems.The interpretation of information and understanding how information and hypotheses are linked must take into consideration all aspects of information provenance.Further, when we consider that analysis of more complex, real-world situations is typically team-based and may involve hand-over between teams or involve multiple agencies, it is important to understand an individual's contributions and what data was used to reach conclusions (Wu et al., 2013).Inspecting long provenance chains to identify relevant provenance information to assess credibility, however, remains cognitive demanding.Through the use of argumentation schemes, here we extract relevant provenance data to be introduced in the analysis following our previous work (Toniolo et al., 2014).

Recording provenance
Provenance is recorded in CISpaces using the W3C standard PROV Data Model (Moreau and Missier, 2013).PROV-DM expresses provenance in terms of p-entities (A pv ), p-activities (P pv ), and p-agents (Ag pv ) that have caused an entity to be, and defines different relationships between these elements.Note that in PROV-DM these elements are referred to as entities, activities and agents; we use the p-prefix to refer to provenance elements explicitly.
The left part of Fig. 9 illustrates a provenance graph used and manipulated by CISpaces for the information node p 10 : "Emergency response may be using local water supplies," cf.Fig. 5. Reading the graph from right to left, we can see orange round nodes that are directly associated to the information nodes in CISpaces and hence that Joe, an analyst (p-agent), has imported a piece of information within CISpaces.We can also walk back in time, and thus see that this information has been delivered to the "InfoBox" by a "NGO_Officer" who has communicated key data extracted from a "Crisis_Report", all the way back to the primary sources (i.e., those that first reported or created the information) "Field_Observations" of the area of interest and local "Water_Samples".
Therefore, the provenance chain of a node p j is represented as a directed acyclic graph G P (p j ) of relationships between A pv , P pv , and Ag pv .G P (p j ) is a joint path from the node containing p j to its primary sources; i. e., sources that first produced the information.More details on the formal treatment of provenance graphs is provided in Appendix C.

Reasoning about provenance
A provenance chain G P (p j ) can be queried as a graph pattern P m which is a structured graph with nodes being variables on the p-elements.Following our previous work (Toniolo et al., 2014), we consider three commonly used patterns for intelligence analysis: P g indicating how a p-entity was generated; P s used to identify the primary sources used in the generation of a p-entity; and P t which connects a piece of information with its intelligence requirement.For example, consider in Fig. 9 the provenance chain involved in the acquisition of p 10 ="Emergency response may be using local water supplies," cf.Fig. 5, referred to as p-entity "Node 98c5...".A pattern P g shows that the piece of information p 10 contained in the InfoBox node ("Node 19f8...") has been extracted from the Crisis Report and communicated to analyst Joe by the NGO Officer.P s highlighted in blue Fig. 9 shows that at the source of p 10 the Crisis Report was created on the basis of field observations and water samples by the safety coordinator.
These patterns allow us to check the presence of relevant provenance information that may warrant the credibility of p j and the information about activities (Act i ), entities (Et i ), or facts (Ft i ) p j is concerned with.The patterns can be integrated into the analysis by applying the argument scheme for provenance (LPV) we first introduced in a previous paper (Toniolo et al., 2014): -Given information p j -The provenance chain G P (p j ) of p j includes pattern P m of p-entities A pv , p-activities P pv , p-agents Ag pv involved in producing p j -P m is a reason to believe that information p j is true ⇒ Therefore, p j may plausibly be true Critical questions for this scheme are: CQPV1: Is p j consistent with other information?CQPV2:Is p j supported by evidence?CQPV3: Does G P (p j ) contain p-elements that lead us not to believe p j ?CQPV4: Is there any other p-element that should have been included in G P (p j ) to infer that p j is true?
A question "Can it be shown that the information is verifiable?" (e.g.CQID1, CQCE1, cf.Fig. 6) shifts the reasoning process to provenance analysis.Questions CQPV1 and CQPV2 shift back to sensemaking by requiring further evidence for Act i , Et i , or Ft i to be supported.
To integrate the provenance elements into the analysis, CISpaces extracts and shows available patterns P m to the analyst.The analyst can choose a pattern deemed important for a specific part of the analysis in the Workbox.As per crowdsourced evidence, CISpaces provides an argumentative method to import this in the analysis via a LPV argument scheme.The conclusion already exists in the WorkBox since p j concerns an Info or a Claim node, and the premises of LPV form a Pro-Link to provide additional evidence for p j .This is the case in Fig. 9, where the pattern P s allows us to instantiate an argument from provenance whose premises are p 30 and p 31 , representing respectively P s and the warrant which justifies the credibility of p j on the basis of P s .Claims p 30 and p 31 then provide additional information on why we should believe that "Emergency response may be using local water supplies " through a link p 30 , p 31 → p p 10 .Provenance data supporting a claim might be helpful in further stages of the analysis, and might demonstrate to other analysts that this information was considered important.On the other hand, a pattern P m may be a reason for believing that p j is not credible, based upon reasons expressed by CQPV3 or CQPV4.As discussed in the more general argumentative process (Section 4.1.1),a negative answer to one of the critical questions triggers a new Con-link being formed, representing an attack on the premises of LPV, and therefore indicating that p j would not be supported.In our example, looking closely at the provenance graph of Fig. 9, we notice that from the timestamps attributes of wasGeneratedBy for the Crisis_Report, it appears that this report is one year older, and therefore likely to be not relevant to the current crisis, raising critical question CQPV3.Claim p 33 provides support to the critical questions CQPV3 (p 32 ), which, in turn, undermines p 31 (p 32 → c p 31 ) and consequently p 10 .Extended patterns looking at the timeliness of information could also be considered to assess the credibility of a given piece of information automatically as discussed in our previous work (Toniolo et al., 2014).
With the process suggested above, CISpaces supports analysts in extracting relevant provenance information to be consumed in the process of reviewing the credibility of evidence and hypotheses.Indeed, looking at the provenance of the information p 10 , namely that "Emergency response may be using local water supplies" and at the arguments that we can extract from it (shown in the right side of Fig. 9), we can conclude that p 10 should not be acceptable as it is based on a substantially flawed process.This has far-reaching effects: looking at Fig. 7, accepting p 10 is instrumental to accepting p 18 , which in turn is necessary for one of the three hypotheses explaining the situation.By knowing that there is a reason to believe that p 10 (and thus p 18 ) is not the case as the piece of information is not timely, the hypotheses explaining the situation now become: • Hypothesis 1: -Pumping station explosion because of Gas (p 11 =IN & p 13 =OUT) Note that in Hypothesis 1 we no longer have an explanation for people illness (p 18 =OUT & p 17 =OUT) which would then require further investigation if Hyp. 1 is to be taken forward.
To conclude, we created a new argumentation schemes for automatically incorporating relevant patterns linked to provenance of information.In this way, we effectively support analysts in establishing the credibility of hypotheses, as demonstrated using our running example.

System implementation
CISpaces is a web-based system.CISpaces interface is developed in Python 2.7 and deployed through the Kivy framework (The Kivy Community, 2011).CISpaces clients communicate with each other and may share analyses using a pub-sub architecture provided by ZeroMQ (The ZeroMQ Community, 2007) which is backed by the CISpaces ZMQ share messaging server.A database server is used for persistence: in our case the Gaian Database (Vyvyan et al., 2015) a dynamically distributed federated database, which allowed us to connect with other services developed within the broader scope of this project (e.g., Toniolo et al., 2016, see Section 6.2).The AI support is provided by a series of RESTful services developed in Java (Oracle, 1996))and deployed via an Apache Tomcat server (The Apache Software Foundation, 2002).Exchanges are supported via structured json data (Ecma International, 2017).These services include: 1. the evidence reasoning service responsible for the core evidence-based sensemaking tasks (see Section 4.1).This includes, for example, the identification of hypotheses where a call to the service is made every time the analyst intends to evaluate the current analysis and the results are displayed in the interface.The current view of the argumentation framework is posted to the service structured according to the Argument Interchange Format (AIF, Cerutti et al., 2018c).2. two crowdsourcing services to handle the Crowd-sourced evidence (see Section 4.2): one for the collection of crowdsourced data, one for the analysis of the results.
3. two provenance services: one for recording, storing and retrieving provenance data, one for the analysis and visualisation of provenance records as discussed in Section 4.3.All provenance data is stored in the Gaian Database and is handled and queried via the Apache Jena framework (The Apache Software Foundation, 2010).
The provenance is RDF-compliant following the PROV-O ontology (PROV Working Group, 2013).4. a simulated information retrieval service demonstrating how a stream of information may flow into the system.
In Fig. 10 we depict the CISpaces architecture.Note that the evidence reasoning service is currently openly available as part of the newer open source CISpaces.org(Cerutti et al., 2018b), see Section 6.2.

Expert evaluation of CISpaces
In this section, we discuss how the AI techniques we developed and implemented in CISpaces may advance performance in intelligence analysis thanks to an evaluation of CISpaces with subject-matter experts.
Our key question in this evaluation is "Would CISpaces be adopted by professional analysts?".We intend to study: a) whether analysts consider CISpaces useful in supporting the analysis process, and b) what characteristics of CISpaces would influence the adoption of CISpaces.In Section 3 among our objectives, we discussed the aim of developing a system that supports the processes underpinning analysis by integrating and aligning with analysts' methods in order for the system to be acceptable to and adopted by practitioners.In this evaluation, we demonstrate that indeed analysts believe that CISpaces is valuable in this respect.
In the following subsections, we provide information on the methodology, hypotheses and experimental settings (Section 5.1).The quantitative results are reported in Section 5.2.We follow with a discussion of these results complemented by a qualitative analysis in Section 5.3.

Questionnaire and methodology
We run our empirical study using a questionnaire tool to investigate the analysts' response to a potential introduction of CISpaces for routine activities and its effects on the intention to adopt CISpaces in future.The A. Toniolo et al. questionnaire follows an adaptation of the Technology Acceptance Model (TAM) as proposed by Davis (1989) and its subsequent versions (TAM2, TAM3) (Venkatesh and Bala, 2008;Venkatesh and Davis, 2000).This model has been developed within the Human-Computer-Interaction literature to assess the acceptability of an information system according to various factors 3 measured by indicators, 4 and use such factors to predict the potential adoption of such a system and its use (Legris et al., 2003;Park, 2009;Wu and Wang, 2005).Factors are often hard to measure directly and, therefore, in TAM indicators are used as an indirect measure of the effects of these factors. 5Indicators represent observable characteristics of a system and their analysis alone is useful to identify whether analysts consider these as positive characteristics of CISpaces with respect to their current activities.In addition, this model provides us with a systematic method, through the analysis of relationships between factors (using PLS-PM as described below), to determine strengths and weaknesses of CISpaces which might influence the adoption and use of CISpaces.
In TAM, one of the key factor is Behavioural intention (BI) of adopting CISpaces in this case, and it is influenced by the Perceived Usefulness (PU) and by the Perceived Ease of Use (PEOU) of the system.Factors BI, PU, PEOU are the core predictors of Use Behaviour, which represents how likely it is that analysts will use the system in the future.TAM3 extends the list of factors, by introducing additional external factors influencing PU and PEOU.While maintaining the core TAM components (BI, PU, PEOU), in this research, we adapt the list of external factors introducing some indicators more relevant to our study.We reduce existing lists to those focussed to a potential adoption of CISpaces at an early stage of development and exclude those directly focussed on actual usability since analysts' direct experience with CISpaces is limited.In addition, due to the limited number of participants and the reduced number of indicators, some indicators are regrouped into more general factors.
We adapted our model considering the following three themes: Analysts' Experience with similar tools, perceived Utility of CISpaces and its potential for Adoption in daily activities.From these themes we selected and introduced factors and relationships forming an adapted model, referred to as TAM-A.
Experience.In Section 3, the focus group highlighted that analysts use tools to support their activities, for organising input information, for collaboration, sensemaking and reporting.Here we are interested in understanding whether the analysts' experience (GEX) with similar tools has a positive influence in how they perceive CISpaces' ease of use.
Utility.TAM3 external factors are pertinent to our evaluation to establish whether CISpaces features are useful in improving daily activities.Key to establish the usefulness of CISpaces in our evaluation is the perceived improvement over the output quality (OQL), where output is the analysis in our work, and result demonstrability and relevance of CISpaces to the analysts' tasks (GRE).Perception of external control and computer self-efficacy (GPS) may positively contribute to ease of use.
Adoption.The TAM core factors BI, PU, PEOU will reveal whether there are grounds for CISpaces to be adopted in analysts' daily activities.Relationships between BI, PU, PEOU with other external factors might indicate strengths that can be exploited or weaknesses that need addressing in further developments of our system.
In Fig. 12 we show the resulting TAM-A graphical model with a description of each factor.Links to previous relevant TAM3 factors are shown in the figure.

Experiment settings
The participants were six expert analysts from UK and US, who consented to participate in this academic study and have their opinion analysed and reported in publications.These participants where different from those involved in the study presented in Section 3.While recognising that the participant sample is relatively small, this is a highly expert group of participants in a field where recruitment is challenging.
Our experiment proceeded as follows.Participants were asked to watch a 10 minutes video demonstration of the CISpaces tool using a motivating scenario similar to our running example.The video showed step by step how to create an analysis, the use of argumentation schemes, crowdsourcing, provenance and the automatic evaluation of hypotheses.After watching the video, participants were asked to respond in writing to a set of closed and open questions using the questionnaire tool (TAM-A).The questions were provided to the analysts in a semi-randomised order with respect to the indicators of TAM-A.In total, participants were asked to respond to thirty-five multiple choice questions, evaluated using a 5-points Likert scale, and seventeen related open questions aimed at gathering further information on specific system features.Questions are provided in Appendix D.2.We believe this methodology was suitable for our research questions, the participant sample and the participants' limited engagement time available for our experiments.
The system used for evaluation is as described in Section 4, which developed from our previous version (Toniolo et al., 2015) in interactive componentsincluding more robust and reliable integration of crowdsourcing and provenance analyses, and collaborative featuresand additional AI functionalitiesincluding preference handling and additional information retrieval (see Toniolo et al., 2016;Toniolo et al., 2014).Note that these latter additional functionalities, however, are out of the scope of this research and have not been used for evaluation.

Hypotheses
In formulating our TAM-A questionnaire, we considered what insight could be derived from indicators of characteristics of CISpaces alone, and from the relationships between factors constructed from indicators.We, therefore, identified two hypotheses for the study: Hypothesis 1.Analysts respond positively to indicators demonstrating that CISpaces is considered useful in supporting the analysis process.
Hypothesis 2. All factors (OQL, GRE, GPS, GEX, PU, PEOU) have a positive effect of the degree of intention (BI) to use CISpaces.
Positive evidence for our hypotheses would support our general hypothesis that analysts are likely to intend to use CISpaces in their daily activities.This is based on the assumption (Venkatesh and Bala, 2008;Wu and Wang, 2005) that behavioural intention of adopting CISpaces would have a generally positive effect on actual use if this was to be deployed in future intelligence systems.
The analysis of results proceeded as follows.For H1, the individual factors extracted from the TAM-A model are analysed individually to establish analysts' response to the introduction of CISpaces in routine activities.
For H2, following common research on TAM models (Venkatesh and Bala, 2008), the data collected was analysed using the Partial Least Square Path Modelling approach (PLS-PM) (Lohmöller, 1989).This method combines factor analysis and regression by attempting to build correlations between the nodes of the graphical model in Fig. 12 along their edges. 6Our assumption is that all edges in Fig. 12 represent a positive influence and we number each edge a sub-hypothesis.For 3 Other authors use the terms constructs and latent variables: we will consistently use factors in this paper. 4Other authors use the term determinants: we will consistently use indicators in this paper. 5All factors are measured in a reflective way in this research. 6The analysis was run using the R (The R Foundation, 2004) package plspm (Sanchez, 2013;Sanchez et al., 2015).
A. Toniolo et al. example, H2.1 indicates the hypothesis that the perceived improvement of Output Quality has a positive influence on Perceived Usefulness.We note that the PLS-PM procedure was run with a very small sample, imposing important limitations in the estimation power and in validating statistical significance.Therefore, this analysis only provides limited suggestions on the strength of the contribution of the factors to the degree of intention to use CISpaces.
To complement our hypotheses, we analysed answers to open questions and we provide quotes to support our claims.Answers were simply coded in positive comments, negative comments, or explanations given the short nature of the text provided.
Our results from the quantitative analysis are reported briefly below (see further details in Appendix E), and we then follow by contextualising our qualitative and quantitative results according to the themes that have guided the development of TAM-A : Experience, Utility and Adoption.

Results
Indicators.Fig. 11 summarises the results gathered from analysing the answers to the questionnaire using median and related interquartile ranges for a coding 1-5, where 1 indicates strong disagreement, and 5 indicates strong agreement; red and blue are used to show the two respective polarities.Detailed results are provided in Appendix E. 1.In order to understand whether the questionnaire given to analysts measured the same factor (agreement with the statement), a Cronbach's alpha (Cronbach, 1951) was run on the full results.We obtained a value of 0.89 indicating a high level of internal consistency for the scale used.
Questions related to experience have varied medians indicating more or less experience with a specific method similar to that used in CISpaces.For the remaining questions, we note that the median of most answers is above neutral (coded with value 3), which indicates generally an agreement with the statements.Our results provide positive evidence for Hypothesis H1: CISpaces is a useful tool to support analysts' tasks.
Factors and Relationships.PLS-PM runs in two phases, the first to establish the viability of the measurement model and evaluate the correlations between the indicators and their represented factors, and the second to evaluate the structural model, the hypothesised relationships between factors.Strength and direction of relationships obtained are shown in Fig. 13 and for convenience in the explanation we use arrows ↑↓ to represent positive or negative influences respectively.The figure also reports the regression weights, and the coefficients of determination of the factors, R 2 .The effects between all factors but PEOU→BI are statistically significant at p < 0.05 with the limitations indicated above and high values of R 2 indicate that most of the variance in PU, PEOU, BI can be explained by their independent factors.We obtain positive influences between the relationships H2.2:GRE→PU ↑ , H2.4:GEX→PEOU ↑ , H2.6:PU→BI ↑ , and negative influences between the relationships H2.1: OQL→PU ↓ , H2.3:GPS→PEOU ↓ , H2.5:PEOU→PU ↓ .H2.7:PEOU→PU ↓ is negative but not significant.While revealing information on the strength of these relationships, this analysis showed that the model created is limited in representing the data collected and in predicting power, with limitations in the measurement model and in the structural model, where indicators are only partially representing their factors and to an extent the relationships contradict common TAM results.We believe this is due to the limited sample size.Interpreting these values need caution due to these limitations, hence the conclusions we can draw are tentative observations used to complement the analysis.Further information on this analysis is provided in Appendix E.2.

Fig. 11. Technology Acceptance Model TAM-A adapted for the study.
A. Toniolo et al.

Evaluation discussion
We now discuss the results of the sub-hypotheses associated to the three groups of factors (Experience, Utility, Adoption) by analysing quantitative results and answers to the open questions.

Experience: Does analysts' previous experience align with CISpaces features?
The first set of results, on the left-hand side of Fig. 11, shows the indicators of analysts' experience (GEX): the first five indicate experience with analytical or software tools similar to those employed in CISpaces in previous work, while the subsequent five indicate current similar experience.With the exception of crowd-sourcing (GEX4), analysts had previous experience with computer-mediated analytical tools (GEX0), particularly for argument mapping (GEX1), provenance recording (GEX2), and collaborative analytical tools (GEX3).This shows that our target expert participant sample is familiar with similar tools.With respect to the question of whether those tools are currently used in participants' daily jobs, the results are more scattered.Analysts often use tools for collaboration in particular (GEX8) and provenance analysis (GEX7) as also highlighted in the focus group (Section 3), while they use argument mapping tools (GEX6) and crowdsourcing (GEX9) much less frequently.
The analysts' answers to our open questions informed us of similarities and differences of CISpaces compared to other analytical tools.Some examples are reported below: • Analyst E: "There are small similarities, there are other tools that seem more robust but they are not exactly like CISpaces."• Analyst F: "CISpaces incorporates some features of other analytical platforms, but clearly goes much further.The provenance support is unique in my experience." These quotes show agreement in similarities with other tools particularly for example with link analysis tools (see Section 2).This comparison also shows drawbacks, some due to CISpaces being a research-level prototype which would require a more reliable and robust infrastructure for deployment.In answering these questions, it is also important to note that the system was directly compared by analysts with fully deployed commercial tools commonly used, highlighting intention and potential for adoption.

Utility: Do analysts believe that the features of CISpaces are useful in improving daily activities?
The second group of results in Fig. 11 report factors about features of CISpaces: improvement on output quality (OQL); relevance and result demonstrability (GRE); and ability to control and use CISpaces (GPS).Overall, analysts believe that CISpaces provides satisfactory features to fulfil these requirements as shown by medians mostly placed in the agreement part of the graph.
More specifically, there is evidence for the following factors: • OQL: Analysts agree that CISpaces has the potential to improve and facilitate the analysis process • GRE: Analysts agree that CISpaces is relevant, important, and pertinent to their daily activities.There is also a general agreement among analysts in being able to identify and explain the useful characteristics of CISpaces.• GPS: There seems to be agreement in the perceived control of CISpaces, which is considered easy to use (GPS2).Analysts highlight that the system may not be compatible with other systems (GPS3), as expected being a research-grade prototype.There is disagreement on whether the system would change the way analysts work in daily activities (GPS4).Analysts' daily job can be completed using CISpaces (GPS0), although training would be important (GPS1).The PLS-PM analysis confirms that GPS is the least well represented factor by its indicators.
Output Quality.The perceived improvement on output quality was investigated further as this is an important reason for adopting the CISpaces solutions as novel approaches to intelligence analysis.To formulate the specific questions regarding this factor (OQL), we have Fig. 12. Results Summary for all groups.
A. Toniolo et al. identified, through our previous focus group (Section 3), and analysis of the literature (Section 2) five criteria as key to assess analysis: time, robustness, confidence in the analysis, expression of intent, and decision-making over plausible hypotheses (OQL0-OQL4).Fig. 11 shows that analysts agreed with the proposition that CISpaces provides improvements across all these dimensions.In response to whether there are any further criteria to consider, analysts highlighted in particular the confidence in the source documentation and the recognition of limitations of the available information to avoid analysis only based on existing data.
We further asked analysts to define a robust analysis to better understand what contributes to this process: • Analyst A: "Amount of data used and experience of the analyst." • Analyst B: "Robustness means details for me." • Analyst C: "Analysis proven by a large amount of information." • Analyst D: "Conclusions arrived at via logical analysis based on solid data.The uncertainty is noted in the report, along with possible alternatives." • Analyst E: "The assurance that analysis has been subjected to critical review and been found to rest upon good data and good assumptions; undertaken according to valid methodology."• Analyst F: "Audit trail of the conclusion and how it was developed." These answers show the importance placed on the rigour of the sensemaking process of analysis.Complementing the results on the output quality, these answers give some positive indication that the support that is offered by CISpaces can positively contribute to the analysts objectives and priorities during analysis in their daily activities.
Time.We note that there are conflicting views on whether the use of the system mayto a certain extentlimit the speed of the work (which is a critical characteristic for analysis as suggested in Section 3), particularly as it requires recording all analytical processes and may require additional training to use the argument mapping system.This is visible in the disagreement of GPS indicators but also in the scores of Output Quality (OQL0), where time is the only dimension with lower scores, albeit positive, highlighting a similar point that creating a visualisation of the analysis comes with a cost.When discussing disadvantages analysts expand on this issue: • Analyst A: "A weakness is in the time it would take to create the visualisation.It might be a better tool for training intelligence analysts."• Analyst B: "Having to write them all nodes out and build the diagrams will increase the time required tenfold, and time is the one thing that analysts don't have." Tradeoffs between advantages of graphical representations of analysis and time and effort required to create those is an active research problem in the area of visual representations, highlighting that different representation types determine what information can be perceived (Zhang and Norman, 1994) and where different media may have advantages and disadvantages in providing contributions (Robinson and Pardoe, 2021).Beyond the objectives of this research, these tradeoffs would need consideration in interface design and further studies in future deployments of CISpaces.
Training.We additionally asked analysts about the perceived burden for training to use CISpaces.All analysts reported that a training module and manual are fundamental to be able to use the system.Furthermore, analysts highlighted other important training requirements: • Analyst A: "Understanding argumentation theory and analysis of competing hypotheses."• Analyst B: "Determining how the information would flow into CISpaces is complicated as it depends on the type of analysis.In general, it is an easy to understand software and should have a short training requirement for analysts which are computer literate." Processes.Having discussed the general perspective of analysts views on CISpaces, we have also asked specific questions regarding the advantages in relation to the three core processes where automated support is provided: sensemaking, provenance and crowdsourcing through open questions.In highlighting strengths of the support to the analysis process, analysts mention: • Analyst B: "The strength is the visualisation of two (or more) hypotheses.
Seeing where each hypothesis is supported can help determine which is the better choice."• Analyst D: "An individual's reasoning is documented for others to follow/ collaborate on."

• Analyst E: "It serves as a forcing function. [to structure the analysis]"
For sensemaking there is consensus that the system would help share world views between teams, and while reservations remain with respect to time taken for creating the visualisation, mid to long term analyses would particularly benefit from this approach.Analyst C suggested that CISpaces allows others to see the reasoning behind hypothesis so that everyone can view what an analyst was thinking when they were trying to understand the situation and could better identify questions.Analyst A added "It lays out visually for everyone to see where there are issues." With respect to crowdsourcing, analysts agree that this is a useful capability, bringing new information to the analysis.Opinions are more divergent in that care should be taken to control reliability and expertise of the crowd to avoid potential misinformation.Active research in the area of crowdsourcing focuses on establishing and ensuring reliability of reports and can be easily integrated with CISpaces (e.g., Ouyang et al., 2016b).
Finally, analyst A informed us of issues with provenance of the analysis: "Often a single piece of information can be reported in different ways causing an analyst to believe there are several separate pieces of information rather than a single one reported multiple times".All analysts agreed that tracking and analysing provenance is a very important feature provided that it remains non-editable and unobtrusive unless requested as currently designed in CISpaces.5.3.3.Adoption: Do analysts believe that there are grounds for CISpaces to be adopted in their daily activities?
In the right-hand side of Fig. 11, we report the results for the last group.As for the previous questions, we notice a general agreement of analysts in perceived usefulness of the tool, and intention to adopt the system.There are, however, some drawbacks in perceived ease of use (PEOU0, PEOU1).
To better understand the causes of these drawbacks and how they impact the general behavioural intention, we turn to the results obtained with the analysis of PLS-PM, in other words considering whether there are relationships between the factors that could inform positive or negative contributions to the intention of adopting CISpaces.
In relation to perceived ease of use, we found a moderate positive effect of experience (GEX→PEOU ↑ ) with similar tools.This presumably means that the more experience analysts have with similar tools, the easier it would be for them to use CISpaces.Disagreement in perception of system control can hinder these results (GPS→PEOU ↓ ), however.
For perceived usefulness (PU), we note that the most influential contribution is provided by the relevance and result demonstrability factor (GRE→PU ↑ ), but ease of use negatively influences the results (PEOU→PU ↓ ).A relative minor negative influence on PU is given by perceived improvement in output quality (OQL) which is contradicting our assumption.OQL ratings are higher than PU in particular with respect to robustness, accuracy, expressiveness, and decision-making, showing that there is confidence in improvement perceived, but this is likely to indicate that to obtain more positive results in perceived utility, other features influencing ease of use need to be improved for adoption.
The most positive contributions to behaviour intention (BI) are provided by the perceived utility PU→BI ↑ and indirectly this is provided by the relevance and result demonstrability factor (GRE).The ease of use indirectly and directly negatively influences BI even though the influence is relatively minor.
Corroborating our previous results this shows that while the principled solutions provided by CISpaces are deemed important and relevant for the analysts' daily activities, it is as important for adoption to ensure that the system is usable and integrated with other systems that analysts use.Further developments of CISpaces together with follow up user studies are needed to draw conclusions on the interface usability aspects of the system, which is, however, beyond the scope of this evaluation.
Our last open questions to analysts were in relation to additional capabilities and applications of CISpaces.The possibility of automatically creating intelligence reports from the analysis following some examples from previous research (Hossain et al., 2011) was envisaged as potential additional feature of CISpaces, and when asked all analysts agree that this would be very useful.In terms of applications, analysts highlighted opportunities specifically for analysis with medium to long term timeframe, such as for strategic analysis (Analysts B,D and E), and for complex problems with many moving parts and numerous analysts collaborating (Analysts A,C,D,and F).The use of CISpaces for training has been highlighted as an important opportunity and further to record the analytical process and ensuring robustness (Analyst F).
When looking at the ability to share information, besides some specific interface and usability issues which could be addressed with further development to higher technology readiness levels, a concern emerged in relation to the ability of sharing information, with issues raised about security restrictions and limited bandwidth.Analysts also recommended attention when deploying CISpaces in working environments to ensure compatibility with systems the organisation already adopted, to mitigate additional effort in training and usage.

Evaluation remarks
To conclude the discussion, our analysis suggests that the principled AI methods implemented in CISpaces have potential to advance performance in intelligence analysis.CISpaces has been designed to be a basic research prototype (TRL 3), nevertheless, during the evaluation analysts compared it with commercial systems they use everyday highlighting intention to adopt the system.Analysts also agree that the AI methods implemented in CISpaces are useful in improving their daily activities, in particular thanks to the perceived utility of the outputs CISpaces generates.Analysts recommended that appropriate training and integration with other systems is provided.Tradeoffs have been also highlighted in the time required to build a visualisation which inevitably has a cost.The highlighted drawbacks in CISpaces, however, do not lie on the AI methods underpinning the system: for successful adoption, CISpaces will need data integration with existing organisational standards both for the input and the output of information as well as more advancements in the user interface.

Related work
Formal models of argumentation are used to capture different types of conflicts arising between information (Bex et al., 2003;Walton et al., 2008), to resolve these conflicts ( Čyras and Toni, 2016;García and Simari, 2004;Modgil and Prakken, 2014), and to evaluate the reliability of conclusions (Parsons et al., 2011;Toniolo et al., 2014).Argumentation techniques, however, focus on decision-making, and such methods may require training to be used by analysts due to the extensive formalisation required.Argument mapping provides intuitive and effective support for critical thinking (Reed and Rowe, 2004;van Gelder, 2007), and shows advantages particularly in enriching and understanding of a problem over for example text representations (Carneiro et al., 2021), but does not offer support for reasoning.Argument mapping and formal argumentation can be combined to visualise and analyse arguments or conclusions (Leiva et al., 2019;Reed et al., 2017).In this work, we also combine these approaches to enable analysts to directly interact and benefit from a computational model of argumentation in the construction and evaluation of hypotheses.We chose a subset of argumentation concepts and established a formal correspondence with intelligence analysis concepts, which were identified through a co-design process and verified via a focus group (see Section 3).This subset was intentionally small to limit the training burden.Recent research has focussed on establishing connections between formal argumentation and human intuition (e.g.Cerutti et al., 2014;Cramer and Guillaume, 2019;Toniolo et al., 2018) and can guide future work on studying formally how analysts' training approaches and methods align with formal argumentation models.
To support analysts in better selecting hypotheses, we employ crowdsourcing to facilitate the acquisition of additional evidence and provenance to explore the credibility of information.In recent research, agent-based approaches have been applied to crowdsourcing to automate decision-making on behalf of the requestors such as who to hire (Kamar et al. 2012), which is more akin to a trust decision making problem.More traditional approaches focus on result aggregation to mitigate biases from unreliable sources (Brabham, 2008;Ouyang et al., 2016b;Whitehill et al., 2009).Similarly, work on provenance is primarily concerned with data quality and interoperability (Hartig and Zhao, 2009).In this research, we study how to automatically interpret provenance and crowdsourced data to assist analysts and integrate this information in generating coherent explanations of observed evidence.
Provenance is a novel application for argumentation-based frameworks.The approach that first discussed the use of arguments underpinned by provenance is by Chorley et al. (2008), where provenance is recorded for justifications provided by users during the assessment of policy options.Using provenance for assessment of information quality has also been explored.Hartig and Zhao (2009) proposed a measure of timeliness using a specific model of provenance, according to creation and access time.In our research, we provide a method to extract information from the provenance elements according to simple intuitive patterns.More complex quality measures could also be extracted providing further automation to the analysis of provenance (Pipino et al., 2002;Toniolo et al., 2014).

Comparison with other tools
The AI techniques we implemented in CISpaces advance intelligence analysis across several dimensions: (1) visual exploration of relationships between pieces of information; (2) sensemaking and hypotheses generation; (3) evidence gathering via crowdsourcing; (4) provenance reasoning; (5) collaboration with other analysts.In related research efforts, we also experimented with social sensing (Toniolo et al., 2016) and with automatic information extraction from OSINT and report creation via natural language generation (Cerutti et al., 2019) via the spin-off CISpaces.org(Cerutti et al., 2018b).
To our knowledge, no other tool allows for such capabilities while ensuring a coherent and consistent analyst experience.There are, however, several tools that can be exploited for each of the previously mentioned capabilities.

Information collection and hypotheses generation
Existing visual analytics tools are primarily concerned with supporting the development of situational understanding by identifying links amongand structures present inexisting information.For instance, i2 Analyst's Notebook (IBM, 2017) offers a suite of views for analysts to organise and link information and perform sophisticated network analyses.Jigsaw (Stasko et al., 2008) enables analysts to explore different views of information available for decision-making, including viewing relationships among entities, documents, topics and visualising event/observation timelines.INVISQUE (Rooney et al., 2014), together with a number of other tools that regularly participate in the visual analytics challenge (Visual Analytics Community, 2006), offers a "suite" of perspectives over data that can be used by analysts to query and support sensemaking by facilitating access to evidence.Generally these tools are primarily focussed on organising and collating information for the analysis to take place, but on the other end of the conceptualised model of intelligence analysis proposed by Pirolli and Card (2005), once analysts have formulated available hypotheses, a variety of tools are designed to support hypotheses evaluation and selection.For example, the Xerox PARC ACH tool (Stefik, 2014), the Open ACH (Burton and Knowles, 2010) and others (e.g, Tecuci et al., 2010) provide automated means to perform a weighted ACH propagating uncertainties from evidence to hypotheses to weigh alternatives.As discussed in Section 2, a recent study from Baber et al. (2016) shows that systems available to analysts are limited in providing support for hypotheses exploration and CISpaces is designed to address this gap.

Provenance reasoning
As discussed in Section 4.3, the origins of information (including information from the crowd), and how and by whom this information is interpreted during analysis are important to establish the credibility of hypotheses.Provenance can be used to annotate how, where, when and by whom some information was produced (Moreau and Missier, 2013).
CISpaces is almost unique in its ability of supporting reasoning about provenance to the point of having the possibility of semi-automatically refute hypotheses on the basis of provenance information.Among the few other tools addressing this issue, TRELLIS (Gil and Ratnakar, 2002) enables information received from different sources to be suitably annotated, highlighting contradictions and how these relate to the trustworthiness of sources.

Collaboration with other analysts
Although we did not stress it in this paper, CISpaces allows for the creation of shared canvases so to enable multiple analysts to operate on a same view of the analysis.Among other tools providing similar capabilities, CACHE is a collaborative ACH environment offering some support during the process of deciding upon the most likely hypothesis (Billman et al., 2006).The CACHE tool provides shared access to enable participants to weigh evidence as a team.The paper describing CACHE (Billman et al., 2006) also includes a user evaluation of the system, studying the effects of group composition in mitigation of biases through computer mediated ACH.The experiment shows the potential for collaborative systems to support the process of analysis and have influenced both our research and evaluation.Both CISpaces and CACHE appear to be useful for the work of analysts.Among other tools, Entity Workspace (Bier et al., 2008) supports collaboration in comparing and deciding upon the most likely hypothesis.POLESTAR (Pioch and Everett, 2006), instead, allows the sharing of an individual portfolio of analysis, enabling different users to make suggestions/critiques.

Information requirements, crowdsourcing and social sensing
To make sense of a situation, analysts need to rapidly analyse and link this information to other contextual evidence, to identify explanations of the environment.Gathering additional information is necessary to avoid the rejection of hypotheses on the basis of insufficient evidence (Heuer, 1999) as also highlighted by analysts in our evaluation (Section 5).Information requirements can be targeted by evaluating the value of information and argumentation frameworks similar to that presented by CISpaces may be suitable for this purpose (Robinson and Pardoe, 2021).
The variety of sources that analysts must take into account has recently changed significantly in particular for what concerns open source intelligence (OSINT) such as social media.There are real challenges concerning how to exploit OSINT in effective and reliable ways; for example, the nature of social media sources is such that it is often difficult to distinguish between witness information and hearsay.Reliability of sources and reports is an important concern in these settings, CISpaces can be extended to include more complex aggregations of results to mitigate these issues (Ouyang et al., 2016a;2016b) or with automated support in detecting those responsible for propagating misinformation (Paredes et al., 2021).Further, this should not be seen solely as analysts passively consuming open source intelligence, but utilising networks of contributors through crowd-sourced queries.For instance, public platforms have been shown to be useful sources of information in disaster response, for example in mapping the geography of Haiti after the 2010 earthquake (Zook et al., 2012).Social networks have created greater opportunities to leverage social sensing as methods to collect data about the environment, and in Toniolo et al. (2016) we demonstrated how conversational interfaces can be linked to CISpaces.In social sensing, people act as sensors, share information within a network or respond to data or opinion requests (Burke et al., 2006).
There are several other research directions for crowdsourcing in intelligence analysis.The role of crowd-sourced intelligence and its classification within traditional or new parameters, is itself an active research topic (Stottlemyre, 2015).Recently, the IARPA CREATE project (IARPA, 2017) led the development of the SWARM Systems (Sinnott et al., 2019), a collection of platforms for integrating analytics techniques and informal argumentation to support analysis through crowd-sourced intelligence.The SWARM interface provides a portal which combines capabilities of question-answering platforms and shared document editing systems.Groups in the public and professional domains can contribute with draft reports and opinions to intelligence requirements.A recent evaluation of SWARM demonstrates improvements in the quality of the reports provided when using the system (van Gelder et al., 2020) highlighting the potential for systems to include crowdsourced contributions.

Additional capabilities and CISpaces.org
Developments in natural language text analysis and computational linguistics have enabled greater automation in information extraction.These methods aid in the discovery of criminal groups, patterns of interaction, activity timelines, and so on.Event extraction and characterisation may also be provided from news articles (Lu et al., 2016), and other research has concentrated on the analysis of more complex texts.XIP-Cohere (De Liddo et al., 2012), for example, uses mixed automatic and human annotation to extract and summarise contrasting ideas from documents.Mining arguments from intelligence analysis reports has been studied in Kang and Sinnott (2018) following the significant developments in the area of argumentation mining seen in recent years (Lawrence and Reed, 2020).In CISpaces.org(Cerutti et al., 2018b), a spin-off of the project we are reporting in this paper, we employed natural language processing techniques for automatic information extraction from Twitter, thus demonstrating the effectiveness of linking the system to OSINT sources.
In addition, CISpaces.org(Cerutti et al., 2018b;2019) also employs natural language generation techniques to produce explanations that could be included as reports.To our knowledge, only the Analyst's Workspace (Hossain et al., 2011) provides similar functionality.

Conclusions
In this paper, we illustrate how novel AI methods, based on a combination of argumentation theory, crowdsourcing Bayesian analysis, and provenance recording, advance performance in intelligence analysis.Our research is based on an extensive consultation involving highlytrained, professional intelligence analysts from UK, US, and international agencies in a process of elicitation of requirements, co-design and co-development of CISpaces and finally an evaluation of the approaches.
Recruiting experts in intelligence analysis is highly complex due to the nature of their role, and they are an extremely scarce resource, especially those who are highly-trained and with extensive expertise.Due to this challenge, the number of participants in the two studies is limited, however, it is within the lower end recommendations for qualitative research, specifically concerning purposive homogeneous studies (Guest et al., 2006;Miles et al., 2013), those being highly focused on analysts' needs with similar training background and work objectives.
Our experiments conclude that the novel, principled AI methods implemented in CISpaces may advance performance in intelligence analysis.During the evaluation, CISpacesdespite having being designed to be a basic research prototype (TRL 3) -has been benchmarked against commercial systems being used everyday by analysts.Analysts agree that the AI methods implemented in CISpaces are useful in improving their daily activities, in particular thanks to the perceived improved utility of the outputs CISpaces generates.Analysts suggest that CISpaces has potential particularly for collaborative and complex analysis, training novice analysts and to maintain an audit trail of the formation and selection of hypotheses.The analysts' evaluation highlights drawbacks in CISpaces that, however, lie not in the principled solutions, but rather in its interfaces with the data sources and with the user.For successful adoption CISpaces would need further integration with existing systems and further training.These aspects, while being essential for commercialisation, are beyond the scope of this paper.
In designing CISpaces, we worked closely with professional analysts to design a set of objectives for our system to help structure and record analysis, and to facilitate and improve the quality of analysis through automated support.Our supporting features include reasoning about plausible hypotheses through a small set of computational argumentation concepts, support to analyse provenance of information and aggregate and report crowdsourced information.While CISpaces is unique in bringing these features together, this is a limited set addressing only some aspects of the analysis problems and there are many directions in which a system such as CISpaces can be expanded.For example in Section 6, we noted important current research trends in extracting and analysing large amount of information from open sources such as social media or open crowdsourcing tasks.Current advancement on argument mining may help import arguments from secondary sources.Automatically establishing credibility of this information and likelihood of events would further inform analysis and have potential to improve analysts' tasks.Further and more complex policies for collaboration and integration of analyses would also be beneficial as sharing intelligence is often a critical issue between organisations.Future work may focus on identifying autonomous methods to integrate evidence analysis themes more tailored to specific intelligence requirements, in similar ways as we import crowdsourcing and provenance in CISpaces, for example for geo-spatial data or event causality.
From an evaluation perspective, follow up studies with analysts would be important to better understand the extent of the support that CISpaces provides, and to establish the level of training needed for analysts to model a problem in terms of argument components.Future work in this direction would provide a more in-depth evaluation of the potential for the use of this system as well as further insight into the most suited level of automation for analysis tasks.In future, the restricted set of argument components we have chosen for CISpaces can be extended (e.g., with preferences, strict rules, additional schemes and critical questions, or alternative semantics), supported by studies focussed on understanding tradeoffs between components and training burden required to adopt these new concepts.
CISpaces is devised to support intelligence analysts in the military, however, there is growing need for tools supporting deep thinking and which limit cognitive biases in a variety of disciplines, which may benefit from the CISpaces support from scientific enquiries (Cerutti and Pearson, 2018) to legal analyses (Cerutti et al., 2018a) demonstrating its versatility.CISpaces.org(Cerutti et al., 2018b) shows that a substantial part of the code we developed for CISpaces can be easily adapted to different graphical user interfaces.Our research and evaluation demonstrates the potential of an integrated tool building on state-of-the-art AI and argumentation-based techniques to aid human effort in better interpreting evidence in highly complex environments.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
We would particularly like to acknowledge the contribution made by the late Paul Sullivan to this work.Without his expertise, this research would not have been possible.We would like to thank the professional analysts from the UK, US and international agencies for their support in developing this research.
This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001.The views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government.The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

Appendix A. Mapping CISpaces to ASPICþ
Here we give the formalisation of the mapping from CISpaces to ASPIC+ (Modgil and Prakken, 2014) argumentation framework, which is restricted to ordinary premises and defeasible rules without preferences, and we discuss how the argumentation schemes are considered in this formalism (see Section 4.1).

A1. An ASPIC-like argumentation framework
Definition 1.An argumentation system AS is a tuple 〈L , , R 〉 where L is a logical language, is a contrariness function, and R is a set of defeasible rules.The contrariness function, , is defined from L to 2 L , s.t.given φ ∈ φ with φ, ϕ ∈ L , if ϕ ∕ ∈ φ, φ is called the contrary of ϕ, otherwise if ϕ ∈ φ they are contradictory (including classical negation ¬).A defeasible rule is φ 0 , ⋯, φ j ⇒φ n where φ i ∈ L .
We refer to a rule α⇒β as r, where α is the antecedent and β is the consequent.Definition 2. A knowledge-base K is a subset of the language L .An argumentation theory is AT = 〈K, AS〉.
An argument Arg is derived from the knowledge-base K of a theory AT.Let Prem(Arg) indicate the premises of Arg, Conc(Arg) the conclusion, and Sub(Arg) the subarguments: Definition 3.An argument Arg is defined as: Attacks are defined as those arguments that challenge others, and defeats are those attacks that are successful: we use only rebutting, when two arguments have contradictory conclusions; and undermining, when the conclusion of an argument is the contrary of a premise of another argument.Since we do not consider preferences, attacks are always successful.Moreover, while we do not explicitly encompass undercuttingwhen the conclusion of an argument is the contrary of a defeasible ruleit can be represented with the introduction of an additional premise, as often considered in literature, see for example Dung et al. (2009), and Čyras and Toni (2016) for a discussion.Definition 4.An argument Arg A defeats an argument Arg B iff: An abstract argumentation framework (Dung, 1995) AF corresponding to an AT includes a set of arguments as defined in Def. 3 and a set of defeats as in Def. 4. Sets of acceptable arguments (aka.extensions) in an AF can be computed according to a semantics.The set of extensions that we consider here is ξ = {ξ 1 , …, ξ n } ∪ {ξ S } such that each ξ i = {Arg a ,Arg b ,⋯}.The extensions ξ 1 , …, ξ n are the credulous-preferred extensions identified via preferred semantics; i.e., maximal wrt.set inclusion extensions that are conflict free (i.e., no arguments in any extension defeat each other), and admissible (i.e., each argument in the extension is defended against defeats from "outside" the extension).The skeptical-preferred extension ξ S is the unique intersection of the credulous-preferred extensions.

A2. CISpaces argumentation theory
In CISpaces the core view where the analysts construct the analysis is called WorkBox.Here, we define the mapping of a WorkBox view to the corresponding AT, called WAT.A Pro-link in the Workbox is textually represented as [p 1 , ⋯, p n → p p ϕ ] indicating that the Pro-link has p 1 , ⋯, p n as incoming nodes and has the outgoing node p ϕ .Definition 5. A WAS is an argumentation system 〈L , , R 〉 constructed as follows: i L is a propositional logic language, and a node corresponds to a proposition p ∈ L .The WAT set of propositions is L w .
ii The set R is formed by rules r i ∈ R corresponding to Pro-links between nodes such that: [p i for a set of rules r 1 , ⋯, r n ∈ R indicating a cycle (i.e. for all p i that are consequents of a rule r there exists some r ′ containing p i as antecedent), then The mapping from WAT to the ASPIC+ framework is similar to that adopted between OVA+ and various solvers (Reed et al., 2017) with the exception of Con-links mapping (w.r.t.Def.5(iii)) and inference cycles (w.r.t.Def.6(i)).CISpaces stores data in the Argument Interchange Format (Cerutti et al., 2018c).In particular, the option for an analyst to write p 1 , ⋯, p n linked to p φ with a Con-link is mapped to the argumentation framework with a rule that has a contrary consequent to p φ , whereas in other frameworks each individual p i with i = 1, ⋯, n is considered as a contrary to p φ .Our approach allows the representation of a contrary of a term (p ϕ , p φ ) ∈ as in other models, for example an unreliable messenger may be a contrary for them to be in a position to deliver a message, however it also permits a compact representation of additional constraints.For example, inspired by Caminada and Wu's Tandem example (Caminada and Wu, 2011), we might consider three gangs, where each gang would only collaborate with another gang if the third is not involved.With a representation where a Con-link is created for a pair of every two gangs against the other, we obtain preferred extensions that include pairs of gangs, rather than a single gang.Additionally, premises of inferences that form a cycle in existing models are not considered part of the knowledge base.In our framework, we are able to distinguish between information and claim nodes, and we chose to consider info-nodes as asserted propositions part of the knowledge-base.
Hypotheses identification.In CISpaces we use a WAT as translation of a WorkBox to evaluate plausible conclusions and to show available hypotheses to the user.Definition 7. Given an AF corresponding to a WAT, a proposition p i and an existing extension ξ j , p i is acceptable if there is an argument Arg i ∈ ξ j that has conclusion p i .
CISpaces uses the efficient solver developed by Cerutti et al. (2016) to identify preferred extensions.Given the set of all extensions ξ in the WAT, the analyst is presented with n colouring options that indicate when a node contains a statement that can be supported, unsupported or undecided.A node is supported if it contains a piece of information that is acceptable or is defended against its defeaters.A node is unsupported if it is rejected, and undecided if it has insufficient grounds to be either supported or unsupported.

Definition 8. The set of options
X,?}}.The assignment of col i for p i given an extension ξ j ∈ ξ is: The set of supported conclusions consists of the supported elements of an option O V i .Each option is available to the analyst for inspection, and represents the semantic mapping from extensions to hypotheses as partial explanations of a world.An example mapping from a WorkBox argumentation theory (WAT) to an abstract argumentation framework (AF) and the set of options (in this case

A3. Mapping argumentation schemes
Let us recall once again the argument from expert opinion (Walton et al., 2008) from Section 4 completed with implicit premises for the conclusion to hold: -Source E is an expert in domain S containing proposition A, -E asserts that proposition A is true, -Implicit: E is a credible source, E is reliable, there is evidence supporting A ⇒ Therefore, A may plausibly be true.

B1. Analysis of collected results
For categorical data we are interested in knowing the probability distribution π of the categories of a multi-valued answer to question q k .Since q k has m possible outcomes, corresponding to the m categories, answers to q k represent a discrete distribution parametrised by the vector θ = 〈θ 1 , …, θ m 〉, where P(X = j|θ) = θ j and ∑ m j=1 θ j = 1.Given s the number of participants reporting, X = 〈X 1 , …, X s 〉 is such that ∀z, X z ∼ discrete(θ); and n The vector n is known as a sufficient statistics for θ because it supplies as much information about θ as the original vector X does.
From Bayes theorem, P(θ|n)∝P(n|θ)⋅P(θ).We conveniently choose as prior its conjugate, the Dirichlet distribution parameterised by α = 〈α 1 , …, α m 〉, α j > 0, of the form: In this paper, independently of the chosen prior, the vector ) refers to the resulting expected values for the m categories of question q k .
For numerical data, we consider a weighted mean of the s collected reports Y k for q k .In the simplest case, weights w i are assumed to be 1, although these may vary according to features of the reports as for the prior probability.Then, for question q k we consider the weighted average of answers: After the aggregation of responses, the results are introduced in the analysis using an adapted argument scheme from generally accepted opinion of which the formalisation is provided in Fig. B.2.

Appendix C. Provenance
In this section, we explain the formalism underpinning the recording and exploration of provenance of information presented in Section 4.3.The underpinning language for provenance we use in this research is the W3C standard PROV Model (PROV-DM, Moreau and Missier, 2013).PROV-DM records provenance in terms of entities, activities, and agents that have caused an entity to be and it defines seven relationships between these elements (Fig. C.1).We refer to those with a prefix p-.An entity is a physical or conceptual thing such as a report or a piece of information; an entity may be derived from other entities.An activity represents a process that acts upon entities; e.g., extracting, creating entities.Entities are generated by an activity, and they represent resources that can be consumed (used) by other activities.An activity may inform another activity by triggering it to take place.An agent is something or someone responsible for an activity taking place such as a person, or a software tool.An agent may author an entity or it may act on behalf of other agents.
A record of provenance is formed by nodes (p-entities, p-agents, p-activities) and directed relationships between these nodes.Such a record can be represented as a directed acyclic graph.We may then explore these graphs using OPQL (Lim et al., 2013), a provenance query language that supports lineage queries.Our extension of OPQL for dealing with PROV-DM is presented in Toniolo et al. (2014), here we recall the main elements of this

formalism.
A Provenance Graph is a graph G P = (N, E) where a node n can be of type p-entity a (from a set A pv ), p-activity ap (from a set P pv ), or p-agent ag (from a set Ag pv ).The set N is composed by N = A pv ∪ P pv ∪ Ag pv = {n 1 ,n 2 ,⋯}.Similarly an edge is labelled with the type of relationship among those defined in Fig respectively representing: a p-activity ap used a (E u ); a p-entity a was generated by ap (E g ); a p-entity a 1 was derived by a 2 (E d ); a p-activity ap 1 was informed by ap 2 (E i ); a p-activity ap was associated with ag (E aw ); a p-entity a was attributed to ag (E at ); and a p-agent ag 1 acted on behalf of ag 2 (E b ).
Nodes n and edges e comprise a set of attribute-value pairs.Given a set of attributes Att = {attribute 1 , attribute 2 , ⋯} and a set of corresponding values Val = {value 1 , value 2 , ⋯}, a mapping function att : E ∪ N × Att→Val associates a value to an attribute of an edge or a node.For example, the name Inf 1 of an entity a 1 is att(a 1 , name) = ˝Inf 1 ˝the time associated with a generation edge e 1 = (a, ap) is att(e 1 , time) = "2020-01-22:T11-51-00".
In CISpaces, we have two datasets available I and P .The dataset I = {⋯} includes pieces of information which corresponds to the information contained on a node p i created in the WorkBox.P contains a graph of provenance data for information in I .
Usual operations and properties of a graph apply, in particular, the union of two subgraphs is represented as G P1 ∪ G P2 whereby G P1 = (N 1 , E 1 ) and G P2 = (N 2 , E 2 ).A directed path is represented as D P (n 0 , n k ) = (N, E) with nodes N = {n 0 , ⋯, n k } and edges E = {e 0 , ⋯, e k− 1 } such that e i is an edge directed from n i to n i+1 , for all i < k, and a shortest directed path is one where the cardinality of the edge set is the minimum.

Definition 10. (Provenance chain)
A provenance chain of a node n j in P is a subgraph G P (n j ) = (N ′ , E ′ ) of G P = (N, E) such that:  A. Toniolo et al.This means that a provenance chain is a graph G P (n j ) representing a union between all the paths from node n j in P to a node n q ∈ N that does not have successors.The provenance chain G P (p j ) indicates a graph G P (n j ) of an entity node n j that is linked to information p j through att(n j ,name) = p j .Henceforth, for convenience we will refer to G P (p j ) in general discussion, but the formalisation is presented in terms of the correspondent graph G P (n j ).
Given a provenance graph G P (n j ), a query to the provenance dataset P in OPQL is made by using graph patterns and pattern matching.Definition 11.A graph pattern is a pair P m = (G M , C), where G M = (N M , E M ) is a graph motif and C is a predicate8 on the attributes of the motif.A graph motif G M is a graph with a certain structure but where nodes and edges are identified by a variable.
A graph pattern P m = (G M , C) is matched with a graph G P = (N, E) if there exists an injective mapping ϕ : N M →N such that: i) ∀e(n 1 , n 2 ) ∈ E M , the mapping (ϕ(n 1 ), ϕ(n 2 )) is an edge in E ∈ G P ii) predicate C holds in the mapping of G M in G P The matched graph is a graph identified by 〈ϕ, P m , G P 〉 and referred to as ϕ Pm [G P ].
A graph pattern is a variable that permits the extraction of the structure required by the pattern.A 1-node pattern extracts all nodes that are named with a specific label (e.g.,"Observer").A 2-node pattern extracts an edge between two nodes.These 1-node or 2-node patterns are used to perform queries in order to extract a named node or a named edge with specific attributes.In CISpaces, we use three composed patterns to record a provenance chain for a piece of information and query it to extract schemes to be included in the analysis: Extraction of information and updates: A pattern P g for generating entities takes two entities, a 1 and a 2 , whereby a 1 was derived from a 2 .Activity ap 1 was responsible for generating entity a 1 using a 2 and it was associated with actor ag 1 .Preparation of a document and primary sources: this is a source pattern P s where the centre of the provenance record is an activity ap 1 that generates the document recorded in entity a 1 and uses a number of sources a 2 , ⋯, a n .An important attribute qualifies an entity as the primary source, where att(a, type) = "Primary Source".Primary sources are those that first reported or created the information.Intelligence requirement or goal of analysis: this pattern P t is fundamental for recognising the goal of the analysis.This may also be called an intelligence requirement or a request for information.P t denotes the triggering activity ap 2 that caused activity ap 1 to be executed.Goals are marked with attribute C : att(a 3 , type) = ˝Goal˝.
The structure of these three patterns is represented in Fig. C.2.In CISpaces, when the provenance of a node p j in the WorkBox is inspected, the system queries its provenance chain G P (n j ) by finding all correspondent matched parts of the graph ϕ Pm [G P (n j )] for each of these three patterns P m ∈ {P g ,P t ,P s }.The resulting matched patterns are shown to the analyst who can choose to bring a specific matched pattern of interest in the WorkBox in the form of an instantiated argument scheme.In Fig. C.3 we provide a formalisation of this scheme using predicate-like labels as above.

Appendix D. Study material
In this section, we provide further information on the material of the studies.

D1. Focus group
Below we list the guiding questions of our focus group (Section 2).
1. Could you describe typical day-to-day activities of an analyst? 2. What is the timeline for analysis?How long is the process?3. What is most useful analytical tools to help analysts to make connections?4. For new analysts on the job, how is the ground knowledge about a topic formed?How do you get feedback on the quality of work done? 5. What are the techniques to identify new information requirements?What are the criteria to distribute the new queries?6.What sort of biases can affect the analysis?How would an analyst prevent such biases?7. Is trustworthiness of information sources important?What are other factors that lead an analyst to consider a hypothesis to be more reliable than others?How is previous analysis used for new tasks?8. What kind of collaborations would an analyst be involved in?What is making collaboration effective and how do you communicate?9. How much of the current role is assisted by technology?Where do you see the most significant areas for improvement using technologies that should be possible today?

D2. TAM Questionnaire
Below is a list of closed questions used for the experiment described in Section 5 following our TAM-A model.Closed questions required analysts to respond to a 5-points Likert scale (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree).Questions were provided to the analysts in a semirandomised order, shown by the question numbers.We indicate questions that have been reported in the analysis with an inverted scale, meaning that it is the negated indicator that we would expect would provide a positive contribution to the general factor.
Group 1: Experience Analysts were given a demonstration video to watch before answering the questions.The video is included in the Supplementary material.

Appendix E. In-depth Study Results
In this section, we provide further information on the results of the study discussed in Section 5.

E2. PLS-PM Analysis
PLS-PM runs in two phases, the first to establish the viability of the measurement model and evaluate the correlations between the indicators and their represented factors, and the second to evaluate the structural model, the relationships between factors hypothesised (Chin, 2010;Sanchez, 2013).Factors are measured in a reflective way, which means indicators are consequences of the factor.
We note first that, as discussed in Section 5, the analysis is run on a small number of participants and, therefore, correlations are sensitive to small variations and hard to generalise.
The first phase of PLS-PM focussed on determining how well indicators represent their factors.The results obtained by loadings and cross-loadings are analysed to ensure unidimensionality of the factors, which indicates whether a factor is well represented by its indicators.We have removed indicators with zero variance, as they provide no information on the analysis (GRE1).We have removed indicators too loosely correlated with their factors where loading was nearly zero (GRE0).Three cases where the indicator was highly inversely correlated with its respective factor were rotated (GEX3, GEX8 and GPS1).In this set of results, it is likely that lack of experience with collaborative analytical tools would better represent the level of experience with general tools particularly in current experience (GEX8) and for consistency in previous experience (GEX3).Furthermore, disagreement with the ability to completing the job using CISpaces with due training correlates better with the perception of system control (GPS).
Following the relevant literature, in Table E.1 we report on the values used to assess the measurement model and correlation values for all factors.
Most of the values indicate homogeneity of indicators according to the recommended values in parenthesis but GPS is the most problematic with low values of α and ρ, and GEX with low values of ρ showing poor internal consistency and, therefore, poor unidimensionality.
The analysis of loadings and cross-loadings in Table E.2 shows that nearly 22/33 indicators have factors loadings above 0.7 (the recommended threshold), and in addition, 23/33 load correctly to their factors, while the others load better to other factors, obtaining around 65% of good representative indicators.In particular, those problematic are part of factors GPS and GEX, and which explain the low values of ρ and α.On the contrary, it is also noticeable that PEOU, PU, BI are well represented by their factors, indicating that it is likely that our factor grouping at the roots (or exogenous factors) may need improvement but the core TAM model is well represented on the other hand.
The second phase of the analysis focuses on constructing and evaluating the structural model, or the strengths and relationship between different factors using multiple regressions.We note that most of the factors present an average variance extracted (AVE) > 0.5 (reported on the diagonals in Fig. E.1) meaning that more than 50% variance of the indicators is accounted for.This is with the exception of GPS and GEX which as noted above are not well represented factors.The overall prediction performance given by the Goodness of fit (GoF) is 0.76 slightly over the recommended value of 0.7.As above the results we can draw are limited, and validation through bootstrapping cannot be achieved due to low numbers of participants and limited variances in some of the indicators, here our discussion is limited to consider what factors may contribute to or hinder the intention of using CISpaces.
For this, we can consider the strength of the relationships, which are also presented in Section 5.The structural model resulting from the partial least squares analysis is shown in Fig. 13, reporting the regression weights, and the coefficients of determination of the factors, R 2 .R 2 indicates the proportion of the variation in one factor that is dependent on the variation of the other factor.The effects between all variables but PEOU to BI are statistically significant at p < 0.05, and high values of R 2 indicate that most of the variance in PU, PEOU, BI can be explained by their independent  A. Toniolo et al.

Fig. 5 .
Fig. 5.A CISpaces view of evidence-based sensemaking task of events happening in Kish.

O 1 ,
O 2 and O S ) is presented in Fig. A.1.

Fig. A. 1 .
Fig. A.1.WorkBox Argumentation Theory (WAT) and its translation first into ASPIC, and then into a Dung's Argumentation Framework to derive preferred extensions and hence intelligence analysis hypotheses.

A
.Toniolo et al.

Fig. C. 2 .
Fig. C.2. Patterns used to query a provenance chain of nodes p i in CISpaces.

Table 1
Summary of Challenges and Requirements for Intelligence Analysis, in relation to the focus group (FG) and our objectives (OJ1,OJ2,OJ3).A symbol • in column FG indicates that the focus group raises the challenge in a specific row.In the OJ columns, • indicates that the objective is designed to address the respective challenge in the row.
. The against claim is then linked to the claim that originated our crowdsourcing request, (p 9 in the figure), via a Con-link, p 23 → c p 9 , in line with our definition of critical questions.If one or more conclusions of LCS exist providing evidence for the claim (e.g.p 27 ), individual Pro-links are used to connect these conclusions to the aggregated for statement for, e.g.p 26 → p p 27 ; a single Con-link attacks the opposite against claim, p 27 → c p 23 .Similar links are added if one or more conclusions exist Definition 6.A WAT is a tuple 〈K, WAS〉 such that K is composed of propositions p i , K = {p j , p i , ⋯}, where: 1 , ⋯, p n → p p ϕ ] is converted to r i : p 1 , ⋯, p n ⇒p ϕ iii The contrariness function between elements is defined as: (i) if [p 1 → c p 2 ] and [p 2 → c p 1 ], p 1 and p 2 are contradictory; (ii) [p 1 → c p 2 ] and p 1 is the only premise of the Con-link, then p 1 is a contrary of p 2 ; and (iii) if [p 1 , p 3 → c p 2 ] then a rule is added such that p 1 and p 3 form an argument with conclusion p h against p 2 , r i : p 1 , p 3 ⇒p h and p h is a contrary of p 2 .A.Toniolo et al.
14 During training, job related activities, or personal experience, I have previously encountered... ...GEX0: computer mediated analytical tools.Group 3: Acceptability 24.BI: What would impede the adoption of the tool?39.BI: If, from the analysis, you could automatically generate a text report (e.g. in PDF or Word) that summarizes the hypotheses, would you see this as an advantage?40.BI: What else would you like to see in the tool?41.BI: What would you see as main applications of CISpaces?

Table E . 1
Unidimensionality measures, correlations of factors with AVE on the diagonal.Cells in italics indicate values outwith the recommended thresholds.
A.Toniolo et al.

Table E . 2
Loadings and cross Cross-loadings for the measurement model where bold values indicate the factors, and * shows when the indicator loads with its factor.