Evaluating the impact of visualization of risk upon emergency route-planning

ABSTRACT This paper reports on a controlled experiment evaluating how different cartographic representations of risk affect participants’ performance on a complex spatial decision task: route planning. The specific experimental scenario used is oriented towards emergency route-planning during flood response. The experiment compared six common abstract and metaphorical graphical symbolizations of risk. The results indicate a pattern of less-preferred graphical symbolizations associated with slower responses and lower-risk route choices. One mechanism that might explain these observed relationships would be that more complex and effortful maps promote closer attention paid by participants and lower levels of risk taking. Such user considerations have important implications for the design of maps and mapping interfaces for emergency planning and response. The data also highlights the importance of the ‘right decision, wrong outcome problem’ inherent in decision-making under uncertainty: in individual instances, more risky decisions do not always lead to worse outcomes.


Introduction
Many fundamental spatial tasks involve decisions about an uncertain future. Uncertainty about future traffic conditions, for example, impacts the conduct and success of route planning tasks. Any such impacts are only amplified by safety-critical or emergency applications.
This paper examines how the choice of map-based representation affects the conduct and success of route planning, specifically in an emergency flood application. In our experiment, participants put themselves in the position of an emergency responder during a flood with important supplies that have to be delivered from an origin point 'A' to a destination at point 'B' as quickly as possible. Using six different map-based representations, participants are required to construct a multi-segment route between points 'A' and 'B,' while minimizing the likelihood of the route being blocked, in order to maximize the chance of successful delivery. The six map-based representations are chosen to cover a range of visual variables commonly used to depict uncertainty, including colors, shapes, sizes, and patterns. The chosen representations also contrast abstract versus metaphorical or mimetic symbolizations.
Our experimental design uses performance-based incentives, common in experimental economics to better control the choice environment and motivations of experimental participants (Bardsley et al. 2010). Our results show that participants do indeed make different route choices, which incur statistically significant different levels of risk, using the six different types of map. Accordingly, the discussion and conclusions provide insights into the likely effects of choosing different types of map-based representation for emergency response and planning decisions. In particular, the results throw contrast onto the choices of abstract versus metaphorical or mimetic representations of risk on a map; on participant preferences versus participant performance; and on faster versus slower participant decisions.
The paper also aims to make two broader methodological contributions, building on previous work (Cheong et al. 2016). First, our methodology distinguishes clearly between the participant decision and eventual outcome, an important distinction for risk and planning about uncertain futures (the so-called 'right decision, wrong outcome problem'). Second, our methodology again points to the important role of performancebased incentives for experiments exploring map-based decision-making under uncertainty.

Background
This paper is concerned with spatial decision-making under uncertainty. More specifically, it concerns spatial decisions based on visual communication of map-based information about riskthat is, the probability of a particular event occurring. Risk, then, gives rise to uncertainty in the mind of the decision-maker (cf. definitions of uncertainty in Worboys and Duckham 2004) about the possible outcomes resulting from any decision.
There exists a significant research literature that explores the different mechanisms available to cartographers for communication of risk and uncertainty, surveyed in Cheong et al. (2016), Kinkeldey et al. (2014). Rather than revisit these recent and extensive reviews, this current paper briefly recaps the key findings below, taking Bertin's visual variables as its anchor point (Section 2.1). For further background on cartographic depiction of risk and uncertaintyin particular on the experimental evidence (and in some cases, lack of evidence) to support conclusions about the efficacy of the different optionsthe reader is referred to Cheong et al. (2016), Kinkeldey et al. (2014).
This background section focuses, instead, on two alternative topics in cartography, but with direct relevance to this study. First, we introduce a design distinction, orthogonal to the choice of visual variables: abstract versus metaphorical cartographic symbolization (Section 2.2). Both types of symbols are frequently found on emergency maps and are explicitly contrasted in our experiment. Second, we highlight the importance of these different cartographic choices in their application to emergencies, such as earthquakes, landslides, wildfires, and flooding. The relevant research literature, in particular in the area of emergency situational awareness, supports the idea that cartographic choices do indeed matter to the usability and effectiveness of emergency mapping (Section 2.3). Finally, Section 2.4 introduces the background to the specific experimental approach adopted by the experiments following in Section 3.

Cartographic symbolization of risk and uncertainty
Cartographic symbolization is founded on Bertin's six visual variables: color hue and value, size, shape, pattern, and orientation (Bertin 1983). Maps operate by encoding geographic information graphically using these different visual variables (MacEachren 1995). Mapping the level of risk or certainty associated with geographic information must similarly be founded on visual variables. For example, a substantial body of previous work has explored cartographic encoding of risk or uncertainty through color value (e.g. Schweizer and Goodchild 1992, McGranghan 1993, Van Der Wel et al. 1994, Leitner and Buttenfield 2000, Aerts et al. 2003, Edwards and Nelson 2012, color saturation (color 'purity' or 'intensity' as distinct from color value) (e.g. MacEachren 1992, Buttenfield 1993, MacEachren et al. 1998, Scholz and Lu 2014, and pattern (e.g. Howard andMacEachren 1996, Leitner andButtenfield 2000). The effects of using symbol shape and arrangement (pattern) have similarly been explored for cartographic encoding of risk and uncertainty (Howard and MacEachren 1996, Pang et al. 1997, Pang 2001, Cliburn et al. 2002. In most cases, however, information about risk and uncertainty is additional or secondary to the primary mapped information (i.e. the geographic information with which uncertainty or risk is associated). For instance, in a responders' map of areas impacted by wildfire, information about the (most likely) locations impacted will usually occupy primary cartographic position, above secondary information about the certainty of impact at that location. In order to leave graphical room for encoding primary geographic information using Bertin's visual variables, other visual variables have been proposed and explored for encoding risk and uncertainty. Focus, fog, blur, and crispness were explored by MacEachren (1992). Fog, for example, can be used as an 'atmosphere' through which the map is viewed: the thicker the fog, the more uncertain the information. Similarly, blur was used successfully by Scholz and Lu (2014) to improve decision-making under uncertainty. Their experiments found blur created the least confusion for participants when compared with other visual variables tested, including shape and color saturation. Viard et al. (2011) also evaluated transparency of pattern as a metaphor for uncertainty.

Abstract versus metaphorical symbolization
The cartographic literature on visual variables makes a distinction between abstract and metaphorical (also termed mimetic iconic, or pictographic) symbolization (Slocum et al. 2008). For example, MacEachren et al. (2012) explored a range of icons, including targets and smiley-face icons, to represent accuracy, precision, and trustworthiness of data. The authors argue that abstract, geometric symbols using a single visual variable are well suited to tasks that require preattentive processingthat is, very rapid, subconscious processing (Healey et al. 1996). Iconic symbols, on the other hand, are associative or pictorial, prompting metaphors. MacEachren et al. (2012) argue that iconic symbols are potentially advantageous to use with qualitative aspects of data, such as uncertainty. In the same vein, Senaratne et al. (2014) discussed metaphorical 'thumbs up' and smiley face icons for presenting uncertainty of power levels in data from smart grids. Notably, Bisantz et al. (1999), Finger and Bisantz (2002) also used smiley-face icons in their landmark studies of decision-making under uncertainty. However, in this case the icon itself was used to encode the primary information (whether a potential target was hostile or friendly) while degraded resolution (i.e. akin to focus, above) was used to communicate the level of certainty.
In addition to iconic symbols, other visual variables have natural metaphorical associations with risk and uncertainty (e.g. Kardos et al. 2005). For example, red color hue is often associated with high risk or danger. Ash et al. (2014) used a red color scheme to visualize tornado warnings, and evaluated the effect on participants' fear levels. Red and green together were discussed as a potential encoding of risk in the context of landslides and slope stability in Davis and Keller (1997). In combination with red and green, amber can provide a 'stoplight' metaphor for encoding low (green) through moderate (amber) to high (red) risk or uncertainty. Griffin et al. (2014) tested the stoplight color scheme for representing uncertainty in combination with underlying map data, although in that case users found the combination confusing.
Visual 'sketchiness' is a more recent metaphorical approach to communicating certainty. 'Sketchiness' involves texturing graphics in a way designed to convey a handdrawn or unfinished look to graphics (Tenneti and Duffy 2005). Increased 'sketchiness' tends to convey decreased certainty and confidence. A less sketchy, more precise 'look' tends to communicate higher levels of certainty and authoritativeness. Initial results explored by Wood et al. (2012) showed an increase in engagement and positive attitudes to participation with visualizations that were portrayed using sketchiness. In a follow-up study by Boukhelifa et al. (2012), sketchiness was found to be as intuitive as blur for communicating uncertainty, although participants expressed a preference for dashing over sketchiness and blurring. Other studies have begun to explore sketchiness as a mechanism for communicating uncertainty in maps, including Griffin et al. (2014) and Harris et al. (2016).
Despite this previous work, metaphorical or mimetic representations of risk and uncertainty have been less frequently studied than their abstract counterparts (Kinkeldey et al. 2014). One reason may be that associations with metaphors may vary culturally. For example, while red is commonly associated with hazard or danger in Western society, East-Asian cultures traditionally associate red with happiness and prosperity (He 2011). Hence, the results of testing metaphorical symbolizations may not generalize globally or between cultures. Acknowledging this limitation, this paper uses an experimental setting to explore the suitability of a variety of abstract and metaphorical symbolizations, including encodings using color associations, icons, and sketchiness.

Applications to emergency management and response
Effective decision-making in the presence of risk is essential to emergency management and response. Location is critical to effective emergency response (Kevany 2005). Accordingly maps are acknowledged as a fundamental tool in understanding and assessing risk in emergency situations (Fraser 2010). Numerous previous studies have examined different techniques for mapping and geovisualisation of uncertainty and risk for decision-making in emergency applications, including landslides (Davis and Keller 1997), earthquakes (Goda and Song 2016), extreme weather (Cox et al. 2013, Burston et al. 2015, flooding (Frick andHegg 2011, Seipel andLim 2017), and wildfires (Cao et al. 2016, Cheong et al. 2016).
Mapping and geovisualisation have an especially important role to play in creating situational awareness for emergency managers and responders (Tomaszewski and MacEachren 2012). Situational awareness is concerned with assisting humans to make effective and timely decisions in complex, dynamic, and often stressful scenarios, such as emergencies and other safety-critical applications (Endsley 1995). As the volume and variety of big data streams generated during emergencies grows, so too does the need for systems to support improved situational awareness. (cf. Virrantaus et al. 2009, Tomaszewski et al. 2011, Robinson et al. 2017. As a consequence, a significant body of research is concerned with the design of spatial information and visual analytics interfaces to facilitate situational awareness. This research addresses a diversity of advanced interface issues such as supporting collaboration between multiple stakeholders (for example, collaboration between emergency response teams in the field and emergency managers in the operations center, Resch et al. 2007, Fuhrmann et al. 2008; taking advantage of multiple interface modes, including tangible interfaces, speech, and gesture recognition (Rauschert et al. 2002, Hofstra et al. 2009); and virtual globes (Tomaszewski 2011) and mobile interfaces (Kim et al. 2007).
Nevertheless, despite the importance of such advanced interface designs, conventional maps remain a fundamental tool in supporting situational awareness in emergencies. MacEachren et al. (2011) report that 45 out of 46 experienced emergency managers regarded printable maps as an important function of a crisis management tool when emergency data is derived from social media. It follows that the design of cartographic products for situational awareness in emergencies remains an important research topic. The choice of map symbols, for example, is acknowledged to have an important role in the effectiveness of maps in emergencies (Friedmannová 2010, Robinson et al. 2011. The effectiveness of different map styles (e.g. choropleth and dot map) and use of map legends has been investigated by Stachoň et al. (2010). Similarly, the outcomes of this current paper aim to help inform cartographic design choices in maps for situational awareness and emergency decision-making scenarios.

Experimental approach
Spatial decision-making has been characterized as a process involving first retrieving information about the environment from a map; then weighing up decision alternatives; before applying a decision strategy to implement a decision (Gärling and Golledge 2000). The bulk of existing previous research work, discussed above, has tended to focus on the first step in that process, value retrieval tasks (Kinkeldey et al. 2015), rather than applying decisions. In a survey of 86 research articles on uncertainty visualization, Hullman et al. (2019) found that an order of magnitude more literature focused on the accuracy of information retrieved, as compared to some ground truth, rather than on the quality of actions or decisions resulting from the visualization. Further, relatively few studies evaluate empirically the outcomes of map-based decision-making under uncertainty (Kinkeldey et al. 2014).
Importantly, this distinctionbetween retrieving information from a map and making a decision based on that informationis fundamental to investigations of visualization of risk and uncertainty (as argued in Cheong et al. 2016, with reference to Kirschenbaum and Arruda 1994, Andre and Cutler 1998, Finger and Bisantz 2002, Kirschenbaum et al. 2013, Mason et al. 2014. The importance of this distinction is exemplified by the 'right decision, wrong outcome' problem: under uncertainty, a rational, low-risk decision that makes best use of the available information may, by chance, not result in the desired outcome. Conversely, a few high-risk decisions may, by chance, not be punished with undesired outcomes. In this environment, the representations of risk that facilitate information retrieval might not facilitate optimal decisions. The experimental approach adopted in this paper explicitly aims to redress this balance. Our experimental design is drawn from experimental economics, using performance-based incentives to elicit from human participants quantitative data on the performance of emergency decision-making, not simply map-reading performance. Our approach draws on and extends earlier work by Cheong et al. (2016), which presented the results of a series of experiments in which participants were asked to make decisions about whether to 'stay' or 'leave' a location under threat of impact by a wildfire. The experiments compared one textual and five different map-based cartographic encodings of the level and spatial distribution of wildfire risk. The study revealed that the different representations of risk can significantly impact the pattern of participants' decisions and the success of the outcomes resulting from those decisions.
This prior work also observed that the level of difficulty of the decision-making task can have a defining influence on the degree to which the choice of cartographic representation 'mattered.' With sufficient time and without distraction, participants tended to perform equally well in their decisions, irrespective of how the information about risk was presented to them. However, the same decision task under time-pressure did reveal systematic differences in participant performance (for example, maps encoding risk using a spectrum of hues outperformed more principled cartographic representations, such as color value ramps, Cheong et al. 2016).
Hence, this current paper aims to investigate the cartographic communication of information about risk in a complex spatial decision-making task: route-planning. Instead of the single, simple 'stay' or 'leave' decision in Cheong et al. (2016), participants must construct a safe route between pairs of locations through a road network, while minimizing the risk of being impacted by flooding.

Emergency route-planning experiment
Human participants were presented with an emergency decision-making task related to planning a route through a road network at risk of blockage by flooding. The experiment employed a within-subject design. Each subject was exposed to six treatment conditions. Treatment conditions involved different cartographical representations of the risk of blockage by flood across the road network.

Participants
Fifty-eight participants were selected from an ORSEE database (Online Recruitment System for Economic Experiments, Greiner 2015). Students studying geography or closely related disciplines were excluded to focus on students without specialized knowledge of map-reading. Participants of any previous experiments in a related study were also excluded from this experiment. The mean age of participants was 23.62 years of age, ranging in age from 18 to 55 years of age. The majority of participants had no experience with floods. All participants taking part in the experiment were undergraduate or postgraduate students, with an even distribution between male and female participants. Approximately 50% of participants listed English as their first language and the remainder were of Asian, South East Asian, or European descent. All participants had a good grasp of mathematics and none had formally studied GIS or cartography. Two participants out of fifty-eight had a known color vision deficiency.

Stimuli
Participants were presented with a series of route-planning tasks. In each task, participants had to plan and construct a route through a road network between two specified locations minimizing the likelihood that their route would be blocked. Figure 1 shows the experimental participant interface with an example stimulus. The interface was designed to be easy to use with minimal instruction; easily adapted and modified for multiple experimental setups; and easy to deploy in a range of experimental labs. JavaScript, D3.js, and PHP were used to construct the browser-based interface. Each road-map stimulus was stored as an image with GeoJSON used to store the road network and connectivity.
For each stimulus, participants had to plan and construct their route from the marked origin 'A' to destination 'B' by tracing their route along the existing road network using their computer mouse. Two action buttons were displayed at the bottom of the screen. Participants used the 'submit' button when they were ready to submit their route and move on to the next task. Participant could also use the 'Delete route' button at any point before submission to redesign their route. On submission, their route and any associated data captured was stored in the secure experimental database. In designing the interface, the maps were given the largest possible proportion of the screen and placed in the center of the participant's view. A legend showing flood likelihood was also included to the right of the image (Figure 1).

Treatment conditions
The experiment compared six different graphical representations of the likelihood of blockage (risk) along each segment of the route. Six was chosen as the maximum feasible number of treatment conditions given our controlled experimental design, which involves eight different stimuli per treatment condition (discussed further in Section 3.4).
The six representations of risk were chosen to cover a range of different symbolizations reviewed in Section 2.1. Three of Bertin's six visual variables were used to depict level of risk: color value, size, and pattern. These visual variables are well suited to encode relative quantity and to communicate a 'magnitude message' (i.e. more likelihood of blockage) (Mackinlay 1986). One of Bertin's visual variables, orientation, was omitted from our study, in case participants were confused by the orientation of symbols in the context of the inherently oriented nature of the experimental task: route-planning from 'A' to 'B' through a road map.
Bertin's remaining two visual variables, shape and color hue, are better suited to encoding qualitative differences rather than quantitative differences. As discussed in Section 2.2, an objective of this work was to evaluate abstract versus metaphorical encodings of risk. Color hue and shape were therefore varied to create metaphorical versus abstract representations of risk. These variations in hue and shape were overlaid across the color value, size and pattern treatments, creating a total of six treatment conditions.
The six treatment conditions are summarized in Figure 2 and detailed below. First, three treatment conditions using metaphorical associations with risk were designed (Figure 2), covering our three quantitative visual variables: color value, size, and pattern ( Figure 2(a,c,e), respectively). Red color hue was chosen as an element in the design of all the metaphorical treatment conditions: as discussed previously, red hue has strong semantic association with warnings in many (although not all) cultures.
• Red color value, in which increasing 'redness' indicated increasing risk of flooding (Figure 2(a)). The color ramp was chosen using ColorBrewer (Brewer 2019) to be colorblind safe. There has been some recent discussion in the literature about whether darker or lighter colors better communicate more uncertainty to users (i.e. whether darker is more strongly associated with more certainty or more uncertainty, cf. Seipel andLim 2017, Johannsen et al. 2018). However, in this case, increased risk of a flood blockage event seems natural to represent with darker, more intense colors. • Red warning icons (shape), in which increasing size of a familiar warning symbol indicated increasing risk (Figure 2(c)). A red warning symbol with a white exclamation mark was chosen as a common metaphorical warning icon that can easily be associated with a clear 'magnitude message' (increasing the size of icon indicating increased flood risk). Symbol size (area) carries a highly effective magnitude message (Stevens 1975, Mackinlay 1986, MacEachren 1992, Munzner 2014).
Three corresponding abstract representations were designed as further treatment conditions, using blue color hue in the design of all the abstract treatment conditions (Figure 2). Selecting blue as our experiment's abstract alternative hue to red involved balancing a number of different options and considerations. First, our experimental design, already containing treatments comparing color value, size, and pattern in both abstract and metaphorical representations, necessitated that a single, consistent color be chosen for all three abstract representations. Gray-scale was considered first as a neutral alternative to red. However, gray-scale was found to be too low-contrast to be effective and easily distinguishable, in the context of the map backdrop and network elements required to enable participants to complete the routing task. Other color hues, such as green, were also considered. However, blue does have a relevant semantic association: with flooding and water. On the one hand, these semantic associations weaken the case for blue as an abstract alternative to red. On the other hand, those same semantic associations make blue a natural choice of hue for participants in our flooding example. Further, blue does not have relevant semantic associations with risk and danger, the salient focus of our experiment. Ultimately, blue was selected as the best compromise between these different options and considerations. We return to any potential effects of this choice in the discussion.
• Blue color value, in which increasing 'blueness' indicated increasing risk of flooding (Figure 2(b)) was chosen as an abstract alternative to metaphorical, red color-value ( Figure 2(a)). The blue color ramp used was selected using ColorBrewer and is colorblind safe. • Blue circle symbols (shape), in which increasing size of a blue circle symbol indicated increasing risk (Figure 2(d)) as abstract alternative to the metaphorical warning icon (Figure 2(c)). A round symbol was chosen for ease of distinguishing different sizes of symbols (Stevens 1975, Mackinlay 1986, Munzner 2014). As discussed above, changing symbol size is known to be an intuitive and effective visual variable for making ordinal distinctions (MacEachren 1992). • Blue texture (pattern), in which increasing coarseness of blue stripes provides an abstract indication of increasing risk (Figure 2(f)). The blue texture was chosen as an abstract alternative pattern to the red 'sketchy' pattern ( Figure 2(e)).

Scenarios
The spatial configuration of particular combinations of road network structures and origin/ destination points had the potential to confound our results. Recognizing this as an extraneous factor, each treatment condition was tested across eight different 'scenarios.' Each scenario involved one of two road network layouts: a grid road network (Figure 3(a)) or a radial road network (Figure 3(b)). These road networks were designed to look as natural as possible, inspired by the cities of Melbourne and Paris, respectively. Further, four different pairs of origin 'A' and destination 'B' locations were tested with both grid and radial road layouts. In total, each participant therefore planned routes for 48 different map stimuli: eight scenarios (4 A-B locations Â 2 networks) Â 6 treatment conditions, with different graphical representations of risk.

Participant payment structure
User studies, like this one, offer an opportunity to gather large samples of quantitative data on participants' use of maps and their deliberative process. Unfortunately, there are many reasons that user studies might yield biased or noisy data. User studies are time-consuming, effortful, and potentially boring, increasing the chances of thoughtless or noisy responses.
Experimental economists have developed protocols that are designed to overcome these problems, by both compensating participants for their time with a flat fee and awarding additional performance-based incentive payments. Incentive-based payments create a tangible link between the outcomes of decisions in the experiment, and outcomes for the participant. These procedures are acknowledged to offer particular advantages when used in studies involving risk (Bardsley et al. 2010). Hence, performancebased incentives are increasingly common in studies on visualization and decisionmaking involving risk and uncertainty, such as Cheong et al. (2016), Merrill et al. (2019).
In our experiment each participant was paid a flat $7 participation fee. Participants could earn additional payments up to the value of $9.60 in performance-based payments. Participants were informed that they would receive $0.20 on each task if their selected route was not blocked, and $0 otherwise. In keeping with the narrative of a flood emergency, participants were also advised to keep their route as short as possible, although the financial rewards did not directly depend on the length of participants' routes. Performance-based payments could potentially amount to more than the participation fee, creating a clear incentive to invest effort in each task. In practice, the maximum amount earned by any participant was $13.60 AUD, the average payment was $11.94 AUD, and the minimum payment was $10.00 AUD.

Procedure
The experiment was conducted at the experimental economics laboratory at the University of Melbourne, Australia. Participants were invited into the laboratory and asked to sit an individual booth with a desk and a computer. Each booth also contained consent forms, information regarding human ethics approval and the handling of deidentified data, together with a statement of instructions for completing the experiment.
The instructions informed participants about the payment schedule (see Section 3.5) and advised that the experiment would likely take between 30 and 35 min to complete. After participants read and signed the forms, and had read the instructions (Appendix), the experiment commenced.
Participants were first given a series of six training exercises to familiarize them with what to expect. The training exercises were similar to the scenarios in the experiment, except using a more simplified grid-based road network (see Figure 4).
After completing the training exercise participants began the full experiment, constructing routes in response to the 48 different stimuli. Participants encountered each decision-making task only once. The order of tasks was randomized to mitigate against any systematic learning effects. Participants were not given a time limit to complete the experiment. A preliminary study, conducted when designing the experiment, suggested that the route-planning task was already sufficiently difficult without additional time pressure. However, while no timer was visible to participants, participants' time taken to make decisions were still recorded by our experimental system.
To minimize the effect of progress on performance, participants were not shown their earnings until the end of the experiment. At the end of the experiment, participants also completed a short questionnaire which included demographic questions, participant preferences for different representations of risk, and an opportunity to make any general comments. Participants were also asked about mathematical and mapreading abilities, flood experience, and whether they had a known color vision deficiency.

Calculation of risk along a route
Participants needed to infer the risk of an entire route remaining unblocked based on the combined risk of blockage along the selected segments that make up the route. The combined level of risk for each route was calculated as follows.
The chance of being blocked at any point along road segment i is r i 2 ½0; 1. The chance of being unblocked at every point along segment i is therefore ð1 À r i Þ. If the segment has length L i , the chance of traveling along i without being blocked is therefore ð1 À r i Þ L i . Across a sequence of segments 1; :::; n, the overall chance of a route being unblocked is therefore:  same length, while Segment BC is twice as long. Both routes are the same length and therefore have the same chance of being unblocked: using Equation (1), if L AB ¼ L BD ¼ L DE ¼ 1, then L BC ¼ 2, and the chance of being unblocked along route ABDE is ð1 À 0:3Þ 1 Â ð1 À 0:3Þ 1 Â ð1 À 0:3Þ 1 ¼ 0:343. The chance of route ABC being unblocked is ð1 À 0:3Þ 1 Â ð1 À 0:3Þ 2 , which also equals 0:343. Notice that participants can compare routes without knowing the absolute length of segments. To maximize the chance of being unblocked, a participant needs to weigh up the relative length of segments alongside the probability of being blocked at every point along these segments. Participants visually gauge the relative length of different segments as they would under day-to-day map-reading circumstances. Figure 6 demonstrates two potential strategies for maximizing the chance of being unblocked. In Figure 6(a) the shortest route has been chosen with a length of 240 m, but this route passes through a zone with a high chance of being blocked. There is a 43% chance of being blocked. In contrast, the route in Figure 6(b) passes through a zone with a lower chance of blockage, but it is longer (575 m). It has a 57% overall chance of being blocked.
Using this calculation, each of the participant's chosen routes was determined to be blocked or unblocked based on the total risk encountered over the route. Participants were paid the $0.20 AUD performance incentive for all unblocked routes (see Section 3.5).
A B C D E Figure 5. Example network illustrating the normalized risk measure for routes. Figure 6. Example of two different strategies to navigate through a network: short route through high risk (left) and longer route through lower risk (right). In this case, the second route has a higher risk of blockage.
This scheme means on average participants' performance will translate into increased incentive payments. However, note that it is not necessarily the case that any good (i.e. low risk) individual choice will result in a unblocked route, or vice versa. As in life, when making risky decisions about uncertain futures, chance plays a role in the individual outcome: sometimes good decisions go unrewarded bad decisions go unpunished. We argue that this feature (termed the 'right decision, wrong outcome' problem in Cheong et al. 2016) is at the core of evaluating decision-making tasks, rather than evaluating simply map-reading tasks.

Performance analysis: risk
The first set of results inform the hypothesis that different cartographic representations of risk lead to differences in the levels of risk in the routes chosen by participants. Figure 7 presents the distribution of risk incurred in each treatment condition. While the risk profiles follow a similar pattern, treatment-specific differences are evident. The red color-value treatment condition, for example, has a longer tail of high-risk routes (risk > 0.7) not evident under other treatment conditions. The texture treatment had the lowest maximum level of risk (risk of < 0.7). The warning icon had a second peak of low-risk routes. The highest-risk route was selected under the red color-value treatment with a risk of 0.932. This was considerably higher than the second-highest risk route of 0.774 for the sketchy treatment. The route with the lowest level of risk was chosen under the warning treatment, with a risk of 0.300. Participants in the red color-value treatment incurred the most risk, on average, followed by participants in the circle, blue color-value, sketchy, then warning treatments. Participants using the texture treatment selected routes with the lowest mean and median level of risk.
Analysis of variance (ANOVA) indicated that the level of risk did differ statistically across treatment conditions (p-value < 0.01). Hence, the results supported the hypothesis that participants selected routes with systematically different levels of risk when using different cartographic representations. This was investigated further using a post-hoc Tukey multiple pairwise comparison (Tukey honest significant differences). The results are presented in Table 1.
Thus, Table 1 shows participants incurred significantly higher levels of risk using the red color-value treatment when compared to the sketchy, warning, and texture treatments; using the circle-symbol treatments when compared to the warning-symbol and texture treatments; and using the blue color-value treatment when compared to the texture treatment. The data exhibited no significant differences in risk taken when comparing abstract representations (blue, circle, texture treatments) and metaphorical representations (red, warning symbol, and sketchy treatments).

Performance analysis: outcomes
Given statistically significant differences in the mean level of risk accepted by participants' choosing routes across different treatment conditions (above), a next step is to assess whether these differences in risk levels lead to meaningful Table 1. Pairwise comparison of difference in mean risk by treatment condition. * denotes p-values < 0.05, ** denotes p-values < 0.01 and *** denotes p-values < 0.001. Numbers indicate differences in mean risk between row and column treatment condition (e.g. sketchy treatment exhibited mean risk 0.018 less than red treatment, significant at the 5% level.). red blue warning circle sketchy blue À 0.011 warning À 0.022** À 0.011 circle À 0.001 þ 0.010 þ 0.021* sketchy À 0.018* À 0.008 þ 0.003 À 0.017 texture À 0.030*** À 0.019* À 0.008 À 0.029*** À 0.012 differences in the outcomes associated with different treatment conditions. Route outcomes not only reflect the average level of risk incurred under each treatment condition, but also the variation inherent in the risky process. Our hypothesis was that the choice of cartographic representation (treatment condition) can affect participants' success in routing. Logistic regression analysis was used to evaluate whether the differences in route outcomes are statistically significant. The odds ratios of being unblocked are presented in Figure 8(a). Logistic regression requires one treatment condition be selected as the benchmark. In Figure 8(a), the blue color-value treatment is used as the benchmark treatment condition, as it has the lowest overall routing success.
Despite some differences in odds ratios (e.g. that red color-value treatment was 1.2 times more likely to result in successful, unblocked routing than the blue colorvalue treatment), none of the observed differences were statistically significant (Figure 8(a)). Aggregating by representation type (color value versus symbol versus pattern) and by representation style (abstract versus metaphorical) similarly exhibited no statistically significant differences. Hence, overall, even though the mean levels of risk taken in different treatment conditions were significantly different, the data did not support our hypothesis that cartographic representation affected overall routing success.
However, it was noticed that participants planning routes exhibited significantly higher chances of success (1.62 times, significant at the 99.9% level) using the grid network layout when compared with the radial network layout (Figure 8(b)). Restricting the analysis to scenarios using the more challenging radial network layout only, participants were found to be 1.63 times more likely to route successfully using the red color-value  treatment, when compared with the blue color-value treatment (p < 0.01, Figure 9(a)). Aggregating together the metaphorical representations (red color value, warning symbol, and 'sketchy' pattern treatments), participants were also 1.28 times more likely to successfully identify an unblocked route (p < 0.05, Figure 9(b)). Hence, these results do lend qualified support our hypothesisthat cartographic representation affects routing successat least in the case of routing using the radial network. This tentative result is examined more closely in the context of other results in the discussion, in Section 5.

Performance analysis: time
The third set of results related to the hypothesis that cartographic representation also affects the time taken by participants to plan their route. An analysis of variance supported this hypothesis, revealing that differences in the mean time taken by participants to plan a route under each treatment condition were statistically significant (p-value < 0.001). A post-hoc Tukey pairwise comparison presented in Table 2 shows where these differences lie. Participants' route-planning was up to 5.1 s faster when the information was represented using color-value (blue and red color-value treatments), when compared with other treatment conditions (p-values < 0.001). Participants were slowest in their decision-making when risk was represented using the texture representation (up to 7.8 s slower on average than the blue color value representation.).
However, the data again exhibited no significant differences when comparing the time taken to decide using abstract representations (blue, circle, texture treatments) versus metaphorical representations (red, warning symbol, and sketchy treatments). Odds Ratios (a) (b) Figure 9. Logistic regression: Odds ratios for route traversal success for radial ('Paris') network only, a. under each treatment condition, relative to blue color-value treatment (left); b. for metaphoricalstyle treatment conditions (red color-value, red warning sign, red 'sketchy') relative to abstract-style treatment conditions (blue color-value, blue circle, blue texture) (right). Note: * denotes p-values < 0.05, ** denotes p-values < 0.01 and *** denotes p-values < 0.001.
As the experiment progressed, the mean time taken to plan routes became progressively faster across the cohort. Treatment-scenario combinations were shown in a randomized order, such that learning effects across the experiment were balanced; this suggests that participants progressively acquired route-planning skills that were not treatment-specific.

Participant preferences
Figure 10 summaries participants' most-preferred methods for representing risk. The chart shows the frequency with which participants selected a representation as most preferred (light gray shaded bar) and least preferred (dark gray shaded bar). The blue and red color-value treatments were the most-preferred, with much lower preferences for the other treatments.  Figure 10. User preferences for representation of risk. The chart compares for each treatment condition the number of participants who selected that representation as their most preferred (light gray shaded bar) and least preferred (dark gray bar). Table 2. Pairwise comparison of difference in mean time taken in seconds by participants by representation type: color (treatment conditions blue and red color-value); symbol (treatment conditions circle and warning symbol); and pattern (treatment conditions texture and sketchy). * denotes p-values < 0.05, ** denotes p-values < 0.01 and *** denotes p-values < 0.001. Numbers indicate differences in mean time between row and column treatment condition (e.g. symbol-based treatments exhibited required on average 4.3 s longer to decide than color-value treatments, significant at the 99.9% level.).
Color Symbol Symbol þ 4.3s*** Pattern þ 5.1s*** þ 0.7s The summary in Figure 10 shows only participants' most and least preferred representations. To include all rankings in comparing participant preference, the complete set of all rankings were combined into a single aggregate preference ranking using the Condorcet method. The Condorcet method combines multiple rankings on the basis of pairwise preferences between options. The analysis yielded an overall preference ranking, most to least preferred, of: blue, red, circle, texture, warning, sketchy treatments.

Grid versus radial network
The two network types, grid and radial, were included as extraneous (confounding) factors. The results presented in Section 4.2 and Figure 8(b) shows, however, that network type had an impact upon routing success. Further investigation of this unexpected result was warranted. Accordingly, Figure 11 presents the frequency histograms for the risk levels of the routes chosen by participants using the grid and radial networks. The distributions indicate higher overall levels of route risk and slightly lower variability in risk using radial network routing tasks (mean 0.533, standard deviation 0.080), when compared with routing using the grid network (mean 0.429, standard deviation 0.083).
From Figure 11, it appears participants found it more challenging to consistently identify lower-risk routes through the radial network than the corresponding task in the grid network. A likely explanation for this observed difficulty routing through the radial networks is the additional difficulty of comparing road segments with a wider variety of lengths. Judging the relative risk of two route segments is easier if both segments are the same length (i.e. compare only flood-likelihood level) or if both segments are the same flood-likelihood level (i.e. compare length). Judging the risk of two route segments of different lengths and different flood-likelihood levels is inherently harder because participants must integrate both length and flood-likelihood level. Generally, the radial network has a wider variety of route segments lengths (cf. Figure 3(a,b)) which arguably added to the overall difficulty of the task of routing through the radial network. Further experiments would be valuable for exploring this contention.

Summary of results
A summary of key results is presented in Table 3. The table ranks the mean risk, median time, and number of blocked routes by treatment condition, as well as the user preferences, ranked by Condorcet method (Section 4.4), for each treatment condition. The mean time taken was distorted by a small number of slow, deliberative decisions (in some cases up to 3 min). Hence, median time was a more faithful descriptor of central tendency for the skewed distribution of times.
There was a strong correlation between the median time taken to make a decision and the mean risk incurred along the chosen route. Higher risk routes were associated with faster decisions ( Table 3). The Spearman's rank-order correlation between mean risk incurred and median time taken was −0.83 (p < 0:20). Hence, color-value based representations of risk were associated with faster decision-making together with higher levels of risk. At the other extreme, participants generally planned less risky routes and took longer to plan those routes when they encountered the texture and warning treatments.
There is also a moderate correlation between user preference and time taken (Spearman's rank-order correlation 0.77, p < 0:20). Hence, the representations participants prefer are associated with faster decision-making. The correlation between user preference and mean risk was slightly weaker (−0.6, not significant). No significant correlations were observed between outcomes (blocked routes) and risk, time, or preference.

Discussion
Clear patterns do emerge in participants' route-planning according to the method used to represent risk on a map. However, some of the patterns of difference ran somewhat counter to our expectations. In particular, a pattern of difference between abstract and metaphorical representations was expected (Section 2.2). For example, increasing red color-value might be expected to encourage more cautious decisions, as a result of its semantic associations with danger. Similarly, the sketchy and warning sign representations might be expected to prompt associations with risk and danger more readily, and hence increase performance. The results confound this expectation with no systematic differences observed between metaphorical and abstract representations in terms of levels of risk or time taken to decide. Instead, the results tend to highlight a difference in performance between color value (red and blue) and circle size representations on the one hand; and texture, sketchy, and warning symbols on the other. The former three representations tend to result in faster, more risky decisions; the latter three slower, less risky decisions (Table 3). Completing this picture is the further observation that users' less preferred representations also followed a similar pattern. Color value and circle size were most preferred; sketchy, texture, and warning least preferred (Table 3).
A likely explanation for this distinction is that color value (red and blue) and circle size are visually simpler, and so more intuitive to interpret. By contrast, patterns (texture and sketchy) and mimetic warning symbols are more visually complex, and so more effortful for participants to interpret. These more visually complex, effortful representations of risk might induce participants to pay closer attention to the map, and to consider their decisions more carefully. These results suggest that there may be some benefits to complexity. Such 'desirable difficulty' (also termed 'perceptual interference' effects) are already well known in psychological studies of human learning (Bjork 1994, Yue et al. 2013. While unduly complex maps would be undesirable, it may not necessarily be optimal to minimize complexity for the emergency spatial decision-maker. With this in mind, it was surprising to find that the red color-value treatment and metaphorical representations were associated with significantly better performance in terms of outcomes in one specific case: the radial network layout (Section 4.2). For the radial network layout only, the red color-value treatment was 1.63 times more likely to result in an unblocked route than the corresponding blue color value map, significant at the 99% level (Figure 9(a)). The metaphorical representations taken together, which include the red color-value treatment, were also 1.28 times more likely to result in an unblocked route than abstract representations, significant at the 95% level (Figure 9(b)).
This apparent improvement in decision outcomes is initially puzzling, because red color-value representations were also associated with significantly higher levels of risk taken (see above). Further investigation indicated that while statistically significant, these results are not a reliable basis for drawing conclusions. Closer examination of the data revealed that a handful of cases (fewer than 20) of high-risk routes (risk of blockage between 70% and 80%) selected by participants through the radial network using the red color-value treatment happened, by chance, to be successful (30-40% blockage in outcome). Hence, in a small number of cases, the increased difficulty of routing through the radial network layout (Section 4.2) combined with the increased tendency of participants to select more risky routes using the color-value representations (Section 4.6) led, by chance, to better than expected outcomes.
In large samples, the rate of blockage should approach the level of risk incurred, as it does for the whole data set. This statistically significant result using 50% of the data (radial network layout) is therefore an effect of chance in a relatively small sample, rather than an instance of more risky decisions leading to better outcomes. Thus, even though in terms of routing success in the radial network layout the red color-value outperformed blue color-value treatment, and metaphorical outperformed abstract representation, these differences should be attributed to chance rather than a meaningful effect of representation. This is a concrete example of the difficulty evaluating the quality of decision-making when facing uncertainty in the form of the 'right decision, wrong outcome' problemor in this instance, the 'wrong decision, right outcome' problem.

Conclusion
This study set out to investigate whether the methods used to visualize map risk can make a significant difference to people's decision-making performance. The approach extended that of Cheong et al. (2016), who tested a simple, binary stay-or-leave emergency decision scenario, by investigating decision performance on a much more complex spatial task: emergency route-planning. Participants were asked to plan routes through a road network minimizing the chance of blockage by flooding. The results indicate that the mapbased graphical representation of uncertainty do indeed have a significant effect upon the level of risk taken by participants in choosing their route.
The results further indicate a relationship between the increased length of time participants took to come to a decision, decreased participant preferences for the different graphical representations tested, and increased decision performance, in terms of total level of risk for a route. This is suggestive of a possible causal link: the cost of complex graphical representations in terms of the time, effort, and attention demanded of users may be compensated by increased decision performance using such representations.
These results tend to lend support to growing body of work that suggests most preferred visualization is not, in general, a good proxy for best performing visualization (cf. Wilkening 2009, Quispel and Maes 2014, Cheong et al. 2016). As discussed above, there is a ready mechanism potentially to explain this observation. Arguably, more visually complex representations may demand more attention, thus more time and effort to use, and consequently be less preferred. Participants' most-preferred treatments tend to closely reflect the speed of decision-making, rather than the quality of decision. However, the additional attention, time, and effort demanded but less-preferred representations may in certain circumstances promote more considered, better decisions.
If so, this finding has particular relevance to decisions in safety-critical and emergency applications. In certain situations, speed may be paramount. In other situations, the magnitude of differences in decision speedup to 8 s on average in our experiment, but potentially up to several minutes in individual instances, see Section 4.3and the likely impact of adverse outcomes in safety-critical situations may mitigate towards decision quality over decision speed. Overlaid on top of considerations of both decision quality and decision speed is that of decision outcome. The 'right decision, wrong outcome' problem observed in our experiment (Section 5) again underlines that, under uncertainty, high quality, low-risk, slower, and more deliberative decisions can be only relied upon to lead to better outcomes on average, and not in individual instances.
There may also be more subtle influences of the 'right decision, wrong outcome' problem on participants' performance. This problem is expected to make it harder for participants to receive clear feedback on which representations facilitate better decisionmaking, whether in experiments or real decision scenarios. In turn, this scrambling of feedback may promote more transparent considerations, such as time, rather than decision quality in the preferences of map users. The results of this experiment would seem to fit such a pattern. Some representations, such as red and blue color value maps, were associated with systematically more risky decisions by participants. The role of chance, as evidenced by instances of high-risk choices performing well in the red colorvalue radial maps, may inhibit users' ability to identify poor performance and learn from it. Future research might productively examine this question more explicitly: do different cartographic representations help or hinder learning from past performance in spatial decision-making under uncertainty?
Overall, our results tend to support those previous studies in which texture (Leitner and Buttenfield 2000), sketchy (Wood et al. 2012), and iconic or pictorial warning symbols (MacEachren et al. 2012) have been suggested as good cartographic choices for representing uncertainty for decision-makers. Although these representations were less preferred by participants and led to slower decisions, they tended to lead to significantly lower-risk route choices. Of course, results from quantitative psychometric experiments, such as described here, are notoriously difficult to tie tightly to practical high-level emergency applications. Bridging the gap between realistic visualization scenarios and the precise experimental control required for robust empirical results remains a challenge (Kinkeldey et al., 2014, Kinkeldey et al., 2015, Lickiss et al., 2017, Hullman et al., 2019. Consequently, further work is needed to explore further these results in more realistic decision scenarios than can be achieved using controlled laboratory experiments. Nevertheless, these initial results would seem to warrant such further practical and application-oriented exploration. Ingrid Burfurd is a lecturer in economics at RMIT University. Ingrid holds a PhD in experimental economics from the University of Melbourne. She has a background in environmental policy and has previously worked as an economist for Victorian State Government and on a UN FCCC High-Level Panel.

Appendix. Instructions to Participants
Welcome! Please read the following information carefully.

Overview
During the course of this experiment you will be shown a series of interactive images. The images show maps that depict the likelihood of road blockages in the event of a flood.

Activity
For each image, you are an emergency responder at location A with important supplies that have to be delivered to location B as fast as possible. You will be asked to choose/decide and draw a route (using your computer mouse). The route you select should be as short as possible whilst also having a low likelihood of being blocked. When you have drawn a complete route from A to B, you should then press on the Submit button to save your chosen route. Once you have clicked on the Submit button, you cannot change your decision.

Time
In total, you will be shown approximately 48 images and should expect the experiment to take approximately 30 -35 min to complete.

Payment schedule
You will be paid a base rate of $7.00 for participating in the experiment. You can earn up to an additional $9.60 depending on the route you have drawn. When you have drawn a route when the road is not blocked you earn an additional experimental payment of $0.20. You will not earn any additional payment when the road is blocked. A summary of the payments is shown below.

Questionnaire
After completing the series of scenarios, you will be presented with a short questionnaire to complete.
The total accumulated amount you will be paid, including any additional payments, will appear on the screen after you have completed the questionnaire at the end of the experiment.
If at any point you do not wish to continue the experiment, please inform the experiment staff and you will be paid the base rate of $7.00.
Thank you for your participation in this experiment.

Training Examples
When you are ready please click the 'Start training' button at the bottom of this page. You will be taken to an example scenario for pre-training before the experiment begins.