Does Updating Natural Hazard Maps to Reflect Best Practices Increase Viewer Comprehension of Risk?

In this study, we examine whether updating an interactive hazard map using recommendations from the literature improves user map comprehension. Analyses of experimental data collected from 75 university students revealed that map comprehension scores were not significantly better for those who viewed a “best practices” map compared to those who viewed an existing version. This may be because the existing map was itself better than most other interactive maps. Additionally, we found map comprehension levels to have significant positive relationships with objective tests, but not self-reported measures of spatial ability. Moreover, self-reported spatial ability had statistically significant, but only moderately strong, correlations with objective tests. These results indicate that spatial ability should be measured objectively rather than through self-reported methods in research on map comprehension. Further research is needed to examine the cognitive processes involved in hazard map comprehension, especially using a broader range of map characteristics and population segments with more diverse cognitive abilities.


Introduction
Government agencies use hazard maps, in-print and online, to communicate environmental hazard risks. In many cases, maps made for use by experts such as geologists, engineers, land use planners, and emergency managers are shared with the public. However, these groups have diverse levels of hazard knowledge and cognitive abilities, which can produce confusion when maps contain technical or unnecessary information. As such, a one-size fits all approach to creating and disseminating maps for the purpose of communicating environmental hazards and risk is potentially problematic [1,2].
Despite their widespread use, few studies assess the usability of hazard maps, and even fewer studies have identified map characteristics that are essential for people to accurately assess their risks. Thus, research is needed to (1) determine how maps currently published on hazard management websites compare to the best available map display practices, as outlined in summaries such as Dodge et al. [3]; and (2) determine if people's map comprehension is a function of stable individual characteristics such as spatial, verbal, and numeric abilities.
Some progress toward addressing the issue of map usability can be drawn from the broader research literature on people's interpretations of maps-and even more broadly on visuospatial displays. However, most map studies examine people's map learning and memory and do not assess real-time inferences viewers draw from maps while they view them [4]. The lack of research on how people use and interpret hazard maps in real-time is an important limitation because that is typically how people use them.
The purpose of our study is to explore whether updating an interactive hazard map using best practices helps improve people's comprehension of risk. We also consider how individual differences in cognitive ability affect map comprehension. The results of our research inform strategies to better communicate environmental risks to the diverse audiences who can use map information to prepare for natural hazard events. With $2.6 billion spent annually on preparedness in the United States [5], it is imperative that maps used to communicate environmental hazard risks are effective.

Literature review
In the following section, we summarize research evaluating hazard maps, and then turn to a discussion of map types, cognitive processes in map comprehension, mapping best practices, and determinants of map ☆ This research was supported in part by grants from the National Science Foundation and from the Geological Society of America. comprehension.
These publications explore a variety of dependent variables such as viewer perceptions of risk, risk area accuracy, preferences for map features, misconceptions about visualizations, and effects of user characteristics on performance. These studies concluded that risk area residents are better able to locate and orient themselves using aerial photographs and 3D maps with clearly labeled landmarks than with conventional contour maps [1,2,8,11] and that isarithmic maps produce better understanding than gradational shaded or binned maps. However, color coding scheme and probability coding (numerical vs. verbal) also influence participants' judgments, at least among geoscientists and emergency managers [12]. Furthermore, confusion can occur when aspects of the map are poorly defined, such as having too many or too few features, or have a confusing map legend [24,33]. In addition, people draw important inferences about risk information that is not explicitly provided [7].
Overall, the hazard map studies listed above signify the importance of assessing people's perceptions of map characteristics such as perceived relevance and ease of understanding, as well as accuracy of interpretation.

Map types
To better understand the broader literature, it is important to recognize that spatial displays, of which maps are a specific type, can be classified as iconic, relational, or hybrid [34]. An iconic display represents spatial objects. An example of an iconic display is a road map because it represents the network of roads and the locations of landmarks in a geographical area. A relational display, such as a graph, represents nonspatial variables such as average rainfall in each month of the year or the correlation between education and income. A hybrid display combines an iconic display (e.g., a base map) with a relational display to provide a spatial representation of nonspatial categories or quantities, as when temperature ranges are represented by map contours [35]. Thus, hazard maps are hybrid displays.

Cognitive processes in map comprehension
Accurate interpretation of a spatial display requires viewers to-(1) see the display clearly, (2) pay attention to relevant features, (3) develop a cognitive map, and (4) make inferences from their cognitive map to produce judgements, decisions, and actions [34]. The ability to see the display clearly is affected by factors such as visual element size and the degree of clutter in a display. Attention is influenced by "bottom-up" processes, in which visually salient features such as bright colors capture viewers' attention. It is also influenced by "top-down" processes in which viewers' expectations direct their attention to specific display elements. These expectations are generated by schemas, also known as mental models, which are generic belief structures about entities, their attributes, and the interrelationships among those attributes [36]. People can have schemas of varying comprehensiveness about maps in general and, in particular, about the specific map content being displayed. Accordingly, people can range in knowledge from novice to expert in each of these domains. Another important contributor to the encoding process is the viewer's spatial ability which, following Colom et al. [37]; can be defined as the ability to generate, retain, retrieve, and transform visual images. Map inferences are determined by a viewer's goals, which can be self-generated (e.g., a desire to find the most direct route from one location to another) or externally imposed (e.g., an experimenter-assigned task to reproduce the map).
Most map research assesses the quality of the cognitive maps derived from physical maps or, to a lesser degree, from navigation through the environment. For example, many studies reviewed by Taylor [4] presented viewers with a map, asked them to study it, withdrew the map, and asked them to perform some task indicating the degree to which they learned the map's elements and their relationships (e.g., recall of landmarks, distances among points).
Only a few studies on map comprehension examine the basic elements of map reading skills [38][39][40][41]. Specifically, these are (1) symbol recognition: accurate interpretation of map symbols, (2) direction finding: the determination of geographical directions among landmarks using a map compass, and (3) scale use: determination of actual geographical distances among landmarks using a map scale. In addition, more sophisticated maps, such as topographical maps require (4) contour utilization: the determination of quantities such as elevations from the location of points within contours.

Mapping best practices
Maps can facilitate or impede viewers' map comprehension, depending upon the degree to which they are consistent with viewers' cognitive processes [11,12,42]. The impediments to map comprehension identified in the hazard map literature are consistent with a broader summary of the research literature on visual displays, which concludes that viewers' graph interpretations are a function of seven broad factors [43]. These factors include data complexity (e.g., the number of variables and categories within each variable), data display characteristics (e.g., the discriminability of graphical features-object positions, lengths/areas, colors, dimensionality), viewer tasks (e.g., retrieve point values, compare values, infer relationships), viewer prior content knowledge (expert vs. novice), viewer prior knowledge of display conventions (expert vs. novice), visuospatial abilities, and working memory.

Best practices for visual elements
Researchers have made a number of recommendations to increase map comprehension, such as best base map choice, most important map elements to display, appropriate symbols and labels, and clear hierarchical structure. For example, feature selection eliminates inessential map elements; visual salience draws viewers' eyes to the most important features [44,45]. There is also research that investigates the use of shape, size, and color of map symbols. In particular, shape ranges from abstract to iconic, with comprehension being fastest and most accurate for iconic symbols that do not need a legend [4]. Larger elements are easier to see and more readily attract attention, but can obscure other elements by cluttering the map if they are too large. Recommendations on color choice are outlined below.
Visual salience is often accomplished using color. There are five main recommendations for color choice. First, adapt color schemes to the type of data displayed, such as sequential schemes for data with increasing values (e.g. earthquake shaking intensities), diverging schemes for data whose values are above or below a critical value (e.g. temperatures above or below freezing), and qualitative schemes for nominal data (e.g. forest, lakes, and deserts are green, blue, and yellow, respectively) [12,44,46]. Second, use seven or fewer color classes when displaying data because a greater number produces difficulty matching legend items with data layers [12]. Third, use color-blind friendly (CBF) colors schemes since 7-10% of the male population is red-green color-blind [8,12,46]. Fourth, use real-life color to represent data when possible, such as blue for flooding and red for lava [15,44,45]. Finally, ensure that the colors in the legend match the colors on the map because transparency options and base map imagery can obscure or change map colors [45].

Best practices for content elements
Research on content choice has produced five recommendations. We use the term 'content' to refer to refer to verbal or numeric information provided on or next to a hazard map. First, content must be current and accurate [2]. If hyperlinks are broken, data are old, or information is no longer valid, map users may lose trust in the map and disregard the information-thus impeding personal preparedness [1,47]. Second, incorporate engaging auxiliary information to personalize the hazards [1,7,8,16]. Auxiliary information could include local photographs of past events, personal stories, infographics, and protection measures. Another way to personalize interactive maps specifically is to include a search by address function and the ability to zoom to locations of interest [1,9,13]. Third, avoid specialized terms that many people are likely to misunderstand, such as 100-year flood, peak ground acceleration, and debris flow [13,15]. Fourth, use easily understandable terminology to explain what each data layer and colored zone represents [45]. If this is done properly, users do not need to seek more information to understand the map. Fifth, avoid or clearly explain verbal labels for quantitative variables such as probabilities. Terms such as "low", "medium", or "high" are confusing because there is substantial variation in the numerical values that people assign to these labels [12,22]. This problem can be minimized by providing probabilistic information in multiple formats, supplementing verbal labels with probability percentages (e.g. 30% probability), natural frequencies (e.g. 3 in 10), or graphics such as risk ladders [48], pictographs [49], or shaded displays [12]. Since people vary in their ability to process probabilistic information, presenting more than one descriptor type allows a wider audience to understand the data.

An evaluation rubric for hazard maps
To develop the rubric, we conducted a literature review focused on effective map design, hazard maps as risk communication tools, and risk communication best practices. The review encompassed literature on both static and interactive maps, though most focused on static maps since fewer interactive map studies exist. The recommendations naturally separated into two categories, visual and content aspects of map design. Many of the recommendations were repeated in the literature so we consolidated them to create the "high performance" criteria of the evaluation rubric. We defined moderate and poor performance criteria from there.
The resulting rubric has two sections with nine visual and nine content elements. For each element a map can score from one (poor performance) to three (high performance) points. A map's total score is the points scored divided by the points possible. For example, a map that scores moderate on all items would have 36 points out of 54 possible for a total score of 0.67. The rubric can be used for multi-hazard or singlehazard maps and online or paper maps. Some rubric elements may not apply to every map. For example, visual rubric Element 6, "colors match hazard color," would not apply for an earthquake hazard map. In this case, the points for Element 6 would not be included in the total points possible. Table 1 summarizes the recommendations from the previous two sections for nine visual and nine content elements in the 'high performance' column of the hazard map evaluation rubric.

Cognitive abilities
Although some scholars suggest more complex models (e.g. Refs. [50,51]), propose that spatial abilities can be defined primarily by two factors, spatial visualization and spatial orientation. Spatial visualization is the ability to manipulate or transform the image of spatial patterns into other arrangements ( [52]; p. 173). Spatial orientation is "the ability to perceive spatial patterns or to maintain orientation with respect to objects in space" ( [52]; p. 149). In addition, a third spatial ability that seems particularly relevant to map comprehension is spatial scanning, which refers to "speed in exploring visually a wide or complicated spatial field" ( [52]; p. 155).
Multiple studies find that individuals who have higher levels of spatial ability are better at interpreting and applying map information [38,[53][54][55]. The types of spatial abilities that predict performance on spatial tasks depend on the scale of the representation. Specifically, spatial abilities at small (object) and large (environmental) scales are distinct even though they are positively correlated [56]. Environmental-scale tasks require a distinction between survey knowledge and route knowledge. Survey knowledge involves an allocentric perspective of map elements and their relationships (i.e., aerial view), whereas route knowledge involves an egocentric perspective (i.e., street view) that is defined by the sequence of steps required to move from one location to another [57]. Moreover, relevant spatial abilities also depend on the type of spatial task. For example, in studies of map utilization, the map is continuously present (e.g., Ref. [35]). By contrast, studies of map learning require the recall and reproduction of map elements (e.g., Refs. [57,58]).
Although there does not seem to be any research on this topic, it is also possible that map comprehension and spatial ability scores are affected by a user's level of verbal ability. Map comprehension tests and spatial ability tests require that test takers read or listen to verbal instructions about how to perform the task. As a result, complex instructions could depress scores on map comprehension or spatial tests for those with lower levels of verbal ability. If verbal ability is a significant predictor of map comprehension or spatial abilities, word choice becomes critical when designing experiments to test these factors.
Previous studies use a variety of instruments to measure cognitive abilities. These instruments separate into objective and self-reported abilities. Examples of objective cognitive tests include those developed by the Educational Testing Service [52] and Vandenberg and Kuse [59] that ask participants to perform various timed tasks. Each test measures a distinct cognitive ability. Instruments that measure self-reported or perceived abilities include the Santa Barbara Sense of Direction Scale (SBSOD, a measure of environmental-scale spatial ability), the Philadelphia Spatial Ability Scale (PSA, a measure of object-scale spatial ability), and the Philadelphia Verbal Ability Scale (PVA, a measure of verbal ability) [53]. Since their development, both objective and self-reported styles of measurement have been used to investigate cognitive abilities [60][61][62][63][64][65]. Self-reported ability measures are much simpler to implement, but more research is needed to determine how well they correlate with objectively measured cognitive abilities.

Metacognition
One neglected research question is whether those who have greater levels of map comprehension are able to assess their performance and conclude that the task is easy, an assessment known as metacognition [66]. Although one might presume that metacognitive accuracy is a given-those who struggle to comprehend a map would be aware of the task's difficulty for them-this is not necessarily the case. There is ample support for precisely the opposite finding, the Dunning-Kruger effect, in which less competent people are oblivious to their own ignorance [67].

Research questions and hypotheses
The research reviewed in the previous sections leads to four research hypotheses (RHs) and two research question (RQs) that address the relationships of map comprehension, spatial abilities, and other cognitive abilities.

RQ1. Can map comprehension be meaningfully divided into a Basic
Map Skill scale and an Advanced Map Skill scale?

RH1.
Map comprehension scores of participants viewing a "best practices" hazard map will be significantly higher than those viewing an existing hazard map.
RH2. Objective spatial ability scores and self-report spatial ability scores will have significant positive correlations with each other but nonsignificant correlations with verbal ability.

RH3
. SBSOD scores will have significant positive correlations with PSA scores but will have distinctly different correlations with other variables.

RH4a-b.
Map comprehension scores will have significant positive correlations with (a) objective and (b) self-report spatial ability scores.

RQ2.
Are map comprehension scores positively correlated with metacognitive awareness of performance?

Procedure
To test these research hypotheses and research questions, we randomly assigned participants to a two group between-subjects experimental design in which half of the participants viewed the conventional map and the other half viewed the best practice map (Fig. 1). Participants in both groups began by taking three timed objective tests of spatial abilities. After completing the spatial tests, participants logged on to the hazard map and answered a questionnaire. The questionnaire comprised a map comprehension quiz, three self-report spatial ability scales, a self-report verbal ability scale, and demographic questions. A total of 75 Boise State University students in introductory level courses participated in exchange for extra-credit toward their course grade. The protocol was approved by the Boise State University Institutional Review Board.

Hazard map development
Participants assigned to the existing map were directed to the Oregon HazVu: Statewide Geohazards Viewer (www.oregongeology.org/hazvu/), referred to below as Hazard Map 1 (HM1; Fig. 2). We selected this viewer as is currently in use, displays multiple hazards, and has procurable data layers.
We constructed the best practices hazard map by first developing a rubric consisting of best practices in hazard mapping and science communication from the literature described above (Table 1; see Appendix A for full rubric). We then applied the rubric to HM1 to identify areas of improvement that were then implemented to produce the "best practices" hazard map (HM2; bit.ly/dataview2) using ArcGIS Story Map software (Fig. 2). Finally, all hazard data in HM1 were imported to populate HM2. In addition to updating data colors and map legend terminology, HM2 also included a side-panel with auxiliary information, historical photos, definitions, and further explanation of legend items to help put the data in context. In all, HM2 involved 21 changes to HM1 (Appendix B). There were 15 specific changes in the visual criteria involving 7 of the 9 rubric items. In addition, there were 6 specific changes in the content criteria involving 6 of the 9 rubric items, with some addressing more than 1 rubric item. Fig. 1. After completing the timed spatial tests, students view HM1 (student on left) and HM2 (student on right) and fill out the map comprehension questionnaire.

Map comprehension, spatial ability, and cognitive ability measurement tools
The map comprehension scale comprised 13 questions in two categories covering the basic elements of map reading as well as more advanced skill in map interpretation (Table 2). Specifically, two items addressed participants' compass utilization, two items measured scale utilization, two items measured participants' ability to use the compass and scale in combination, two items measured legend utilization, and five items measured risk interpretation. The mean over the six items addressing compass utilization, scale utilization, and compass and scale in combination yielded a scale of Basic Map Skill. The mean over the seven items measuring legend utilization and risk interpretation yielded a scale of Advanced Map Skill. The internal consistency reliabilities for these two scales were α ¼ .54 and .52, respectively.
The three objective measures of spatial ability were selected from a series of cognitive tests published by Educational Testing Service-ETS [52]. The Paper Folding test measured visualization, the Cube Comparison test measured spatial orientation, and the Map Planning test measured spatial scanning. The Paper Folding test requires people to select which of five options represents how a sheet of paper that has been folded and then hole-punched looks when it is unfolded. The Cube Comparison test requires people to determine if two cubes showing three faces with various designs, numbers, or letters visible on each face are different cubes or are the same cube that has been rotated to present different faces. The Map Planning test assesses people's ability to find the shortest route between two points in a stylized street grid that is partially obstructed by roadblocks. All three tests required the participants to answer as many questions as possible within 3 min and were hand-scored using the total number of correct responses for each test. The estimated reliabilities of these tests range 0.75-0.92 for Paper Folding, 0.77-0.89 for Cube Comparison, and 0.75-0.94 for Map Planning [52].
The three self-report spatial ability measures are the SBSOD and PSA [53,68], as well as the Allocentric View scale (Appendix C). The SBSOD and PSA scales contain questions describing the respondent's ability to perform a variety of tasks that require environmental-and object-scale spatial skills, respectively. For the SBSOD and PSA, participants responded to each item using a five-point Likert scales (Strongly Agree to Strongly Disagree) to indicate the degree to which it applied to them. These two spatial scales were supplemented by a newly developed Allocentric View scale that contains self-report items that are more directly related to map interpretation. That is, the items in this scale supplement the predominantly egocentric view items in the SBSOD. For the Allocentric View scale, participants responded to each item using a five-point scale (Not at all to Very Great Extent) to indicate its relevance to them. Participants also completed the PVA self-report measure of verbal ability using a five-point Likert scales (Strongly Agree to Strongly Disagree) to indicate the degree to which each statement applied to them. Finally, they completed a Metacognition scale, which comprised a four items self-assessment of their performance on the map comprehension task. Participants used a five-point Likert scale (Strongly Agree to Strongly Disagree) to indicate the degree to which each statement applied to them.
After factor analysis and scale analysis, the SBSOD score was computed from the mean of all items except Item 9 (α ¼ 0.89), the PSA score was computed from the mean of Items 5-13 (α ¼ 0.86), the PVA score was computed using the mean of Items 1, 2, 6, and 7 (α ¼ 0.64), and the Allocentric View score was computed using the mean of all five items in that scale (α ¼ 0.77). The Metacognition score was computed using the mean of all four items in that scale (α ¼ 0.77). Variable labels are shown in Table 3.

Mean comparisons
The tests associated with RQ1-Can map comprehension be meaningfully divided into a Basic Map Skill scale and an Advanced Map Skill scale?showed that scores on Basic Map Skill (Mean, M ¼ .81) are significantly higher (t 71 ¼ 2.14, p < .05) than those on Advanced Map Skill (M ¼ 0.74) and, as indicated in Table 4, the two scales have a significant Pearson correlation (r ¼ 0.23) and a nonsignificant Spearman correlation with each other (r ¼ 0.20). The small magnitude of both correlations suggests that map comprehension can be meaningfully divided into two relatively distinct skills.

Correlation analyses
To test the relationships between variables, we computed both Pearson and Spearman correlations (Table 4). We included Spearman correlations since the individual items cannot be assumed to be strictly interval or ratio level measures. However, discrepancies between statistically significant Pearson and Spearman Correlations are between 0.01 and 0.06. Upon testing the 95% confidence intervals for each discrepancy, we found these differences to be nonsignificant (p > .05). As such, the following results reference the Pearson correlation values.
Contrary to RH1-Map comprehension scores of participants viewing a "best practices" hazard map will be significantly higher than those viewing an existing hazard map- Table 4 shows that Map Type is significantly correlated only with Basic Skills and, unexpectedly, that correlation is negative (r ¼ À .27). That is, participants who viewed HM2 tended to have lower Basic Map Skill scores than those who viewed HM1. Moreover, Map Type also has significant negative correlations with ETS Map Planning (r ¼ À .24) and Allocentric View (r ¼ À 0.24).
Mostly consistent with RH2-Objective spatial ability scores and selfreport spatial ability scores will have significant positive correlations with each other but nonsignificant correlations with verbal ability-the three ETS spatial ability tests have significant positive correlations with each other (average correlation, r ¼ 0.49) and all three have significant positive correlations with PSA (r ¼ 0.35), and Allocentric View (r ¼ 0.31). However, Map Planning has the highest correlations with these two variables (r ¼ .41 and .36, respectively) and also with SBSOD (r ¼ 0.26). Neither Cube Comparisons nor Paper Folding is significantly correlated with SBSOD. Although not hypothesized, the ETS Cube Comparisons and Map Planning tests have significant positive correlations with Metacognition (r ¼ .32). Contrary to the hypothesis, PVA score has significant positive correlation with PSA (r ¼ 0.26).
Partially consistent with RH3-SBSOD scores will have significant positive correlations with PSA scores but will have distinctly different correlations with other variables-SBSOD and PSA have a significant positive correlation (r ¼ 0.52). Unexpectedly, however, they have similar positive correlations with Allocentric View (r ¼ 0.56) and Metacognition (r ¼ 0.35) The only notable difference in their patterns of correlations is that PSA is more strongly correlated with PVA (r ¼ 0.25 vs. 0.07), but neither of these correlations is statistically significant.
Partially consistent with RH4-Map comprehension scores will have significant positive correlations with objective spatial ability scores-both Basic Map Skill (r ¼ .28) and Advanced Map Skill (r ¼ 0.32) have significant positive correlations with Map Planning. However, only Advanced Map Skill has a significant positive correlation with Paper Folding (r ¼ .29). and neither map comprehension scale has a significant correlation with Cube Comparison.
Contrary to RH5-Map comprehension scores will have significant positive correlations with self-report spatial ability scores-the correlations of both map comprehension scales with all self-report spatial ability scales are nonsignificant.
The tests associated with RQ2-Are map comprehension scores positively correlated with metacognitive awareness of performance?-show that Metacognition has a significant positive correlation with Advanced Skills (r ¼ .26) but not Basic Skill (r ¼ 0.21), although the difference between these two correlations is not statistically significant. Although not hypothesized, Metacognition and Allocentric View have significant positive correlations with each other (r ¼ 0.37).

Ordinary least squares (OLS) regression analyses
To further test the results from RH1 and RH4, Map Type and Map Planning were entered as potential predictors of Basic Map Skill and Advanced Map Skill. Table 5a shows the results of the analyses for the prediction of Basic Map Skill. The left-hand panel of table shows that, after entering Map Type at the first step, Map Planning failed to enter after that. Conversely, the right-hand panel shows that, after entering  To further test RH2, the self-report measures were entered as potential predictors of ETS scores. Table 6 shows that only PSA scores significantly predicted Paper Folding test scores (Adj R 2 ¼ 0.10 in the left-hand panel) and Map Planning test scores (Adj R 2 ¼ 0.14 in the right-hand panel), but not Cube Comparison scores (Adj R 2 ¼ 0.04 in the center panel). SBSOD scores did not significantly predict any of the ETS scores.
The validity of OLS regression analyses depends upon four assumptions-(1) linearity of the relationships between the independent and dependent variables, (2) independence of errors, (3) homoscedasticity (constant error variance), and (4) normal distribution of errors. Tests following the procedures in Ott and Longnecker [69]; Chapter 13) were conducted for the data used in the regression analyses above and revealed that Assumption 1 is supported by scatterplots of map comprehension against each of the independent variables, which revealed no indication of curvilinearity. Moreover, Assumption 2 is reasonable because the data are cross-sectional so there is no serial autocorrelation. Finally, Assumption 3 is supported by residual plots showing approximately constant dispersion across all values of the independent variables, and Assumption 4 is supported by linearity in the p-p plots of the standardized residuals.

RQ1: can map comprehension Be meaningfully divided into a basic map skill scale and an advanced map skill scale?
The ability to interpret a hazard map is an important skill because many people need these hybrid visuospatial displays to determine whether they are in a hazard zone and, thus, need to take action to protect themselves from hazard impact. Basic and advanced map skills both require a degree of knowledge of mapping conventions and visuospatial skills. However, the results from the analyses of RQ1suggest that these two types of map skills are somewhat distinct because there were significantly higher scores on basic skill than on advanced skill and the two scales were not significantly correlated. More generally, the fact that scores on Basic Map Skill (M ¼ .81) were substantially less than perfect poses a challenge for developers of hazard maps because it means that people make errors when using the two most fundamental elements of these displays-the compass and scale. Further research is needed to determine if this lack of basic map skill can be replicated in samples that are more representative of the broader population. However, it seems likely that map comprehension scores will be even lower in a general population sample than in a university student sample that has been selected specifically for its higher level of cognitive ability. If so, research will also be needed to identify the specific impediments to successful compass and scale utilization, and either develop training methods to improve basic skill or create displays that overcome these impediments.

RH1: map comprehension scores of participants viewing a "best practices" hazard map will Be significantly higher than those viewing an existing hazard map
The lack of support for RH1 is quite surprising because Map Type not only had nonsignificant correlation and regression coefficients with Note. b* denotes the unstandardized regression coefficient; SE(b) denotes the standard error of the regression coefficient; β denotes standardized regression coefficient.

Table 6
Regression of ETS scores onto self-report spatial scale scores. Note. b * denotes the unstandardized regression coefficient; SE(b) denotes the standard error of the regression coefficient; β denotes standardized regression coefficient.
Advanced Map Skill, it had a significant negative correlation with Basic Map Skill. A possible methodological explanation for the nonsignificant correlation and regression coefficients with Advanced Map Skill is that this variable has only modest reliability (α ¼ .52), which would attenuate its correlation with other variables [70]. However, this explanation is contradicted by the finding that Advanced Map Skill had significant correlations with other variables, so this scale seems to be measuring a meaningful construct even though its reliability is lower than is desirable. In any event, the map comprehension scales need further development to increase their psychometric quality.
An alternative explanation for the nonsignificant difference between map types is that there was essentially no meaningful difference between the two map types with respect to their demands for Advanced Map Skill. One variation of this explanation is that the changes made in transforming HM1 to HM2 were an inadequate operationalization of "best practices". Although this possibility cannot be ruled out definitively, it seems unlikely because-as noted above-the production of HM2 involved an extensive set of changes. A second variation of this explanation is that HM1, the existing map, was already quite good at meeting the participants' information needs with respect to advanced map skill, so the improvements implemented in HM2 had a minimal psychological impact on the participants. This explanation is consistent with the finding that HM1 already met many of the best practices. Thus, to better address this issue, further research should examine people's ability to process the information from hazard maps that encompass a wider range of quality with respect to the rubric elements in Table 1.
The explanation for the negative correlation of Map Type with Basic Map Skill involves the software used to create HM2, which was based on uploaded and formatted data and content in ArcGIS Story Maps. Story Maps software has many options but also has feature display limitations. For example, this software sets the map legend to pop-up only when clicked. As the first author watched people navigate HM2, it was apparent that many of them failed to click on the legend, which makes accurate interpretation almost impossible. By contrast, HM1 had a legend always visible. In addition, Story Maps also makes the scale bar a specific color independent of the base-map. Consistent with recommendations from previous studies, HM2 included an aerial image base map and the scale bar was dark grey. This made seeing the scale bar a bit challenging. By contrast, HM1 had a more visible scale bar and included measurement tools that could be used to measure distances precisely. Since the map comprehension test included questions about distance, this would also have contributed to slightly higher scores for HM1 viewers on Basic Map Skill.

RH2: objective spatial ability scores and self-report spatial ability scores will have significant positive correlations with each other but nonsignificant correlations with verbal ability
The partial support for RH2 is consistent with previous research. Specifically, the PSA has moderately high correlations with Map Planning (r ¼ 0.39) and Cube Comparisons (r ¼ 0.36), and a noticeably lower, but still significant, correlation with Paper Folding (r ¼ 0.23). By contrast the SBSOD had noticeably lower correlations with the three ETS tests (r ¼ 0.26, 0.19, and 0.21, respectively). These results support the contention that the SBSOD and PSA, though highly correlated (r ¼ 0.51), are indeed measuring somewhat different constructs [56].
Moreover, consistent with RH2, there are nonsignificant correlations of Paper Folding (r ¼ À 0.04), Cube Comparisons (r ¼ 0.14), Map Planning (r ¼ À 0.14), and SBSOD (r ¼ 0.06) with PVA. However, contrary to this hypothesis, PVA has a significant positive correlation with PSA (r ¼ 0.25). It is not obvious why this is the case because all three of the ETS spatial ability tests and the SBSOD have instructions that are at about the same level of verbal complexity as those for the PSA. Thus, further research is needed to determine if this finding can be replicated and, if so, explained.
As a practical matter, the poor predictability of the ETS tests from the SBSOD and PSA, as shown in Table 6, is unfortunate because the ETS tests are timed and, therefore, must be administered in a carefully controlled setting such as a laboratory. By contrast, the SBSOD and PSA are untimed and can be administered in an uncontrolled setting such as a mail or Internet survey. In turn, this restriction in ETS test administration limits the types of population segments that can be tested using these scales. Consequently, further studies of the effects of spatial abilities on map comprehension should administer the ETS tests in controlled settings.
6.4. RH3: SBSOD scores will have significant positive correlations with PSA scores but will have distinctly different correlations with other variables Regarding RH3, the high correlation of the SBSOD and PSA is consistent with the Hegarty et al. [56] conclusion that these two scales measure related but distinct types of spatial ability-the SBSOD measures spatial ability at the environmental scale (e.g., wayfinding) and the PSA measures spatial ability at the object scale (e.g., object manipulation). The support for this conclusion is particularly noticeable in the factor loadings in Appendix C. Moreover, the only significant correlation of the SBSOD with an ETS test is with the Map Planning test-the only one of these tests that assesses a skill approximating wayfinding at the object scale. Nonetheless, it is difficult to explain, given the assumption that the PSA measures object-scale spatial ability, that this scale's highest correlation with an ETS test is also with the Map Planning test. The most logical explanation is that performance on the Map Planning test draws upon spatial ability at both the object and environmental scales. The present study extends this finding by showing that the PSA scale and Map Planning test have similar patterns of correlations with Allocentric View, and Metacognition, all of which have significant positive correlations with each other. However, the present results provide no support for the contention that the SBSOD and PSA have distinctly different correlations with other variables.

RH4a-b: map comprehension scores will have significant positive correlations with (a) objective and (b) self-report spatial abilities scores
Partially consistent with RH4a, Map Planning was significantly correlated with Basic Map Skill (r ¼ .27) and Advanced Map Skill (r ¼ 0.32). In addition, Paper Folding was significantly correlated with Advanced Map Skill. (r ¼ 0.29) but not Basic Map Skill (r ¼ 0.08). However, Cube Comparison was not significantly correlated with either measure of map comprehension. These results suggest that the Map Planning test provides the most direct measure of the cognitive skills required for map comprehension.
Contrary to RH4b, neither the SBSOD nor the PSA was significantly correlated with Basic Map Skill (r ¼ 0.07 and 0.08, respectively) or Advanced Map Skill (r ¼ À 0.08 and À 0.08, respectively). Indeed, even the Allocentric View scale, which was constructed to be a self-report scale of map comprehension, lacked statistically significant correlations with the two map comprehension measures. The Allocentric View scale does not appear to have suffered from variance restriction (SD ¼ 1.03 is approximately 20% of the scale range) or attenuation due to unreliability (α ¼ 0.77), but there is some room for improvement in this scale and, as noted earlier, substantial room for improvement in the psychometric quality of the map comprehension scales.

RQ2: do participants have a metacognitive awareness of their performance on map skills?
The results regarding metacognitive awareness showed that participants' assessments of their performance is significantly correlated with Advanced Map Skill. That is, those who were better at this task were able to assess their performance and conclude that the task was easier. This metacognitive accuracy is the opposite of the Dunning-Kruger effect, in which less competent people are oblivious to their own ignorance [67]. This finding suggests feedback from the task itself provided poor performers with an assessment of the quality of their performance. In turn, this suggests that map users who are experiencing difficulty are likely to recognize their need to use general Help tabs if these are readily accessible. Indeed, the lower performance associated with the absence of a continuously visible map legend in HM2 suggests that context-dependent help features would be a particularly useful addition to hazard maps.

Study limitations & opportunities for future work
The first study limitation is the sample; students are a subset of the general population that can be assumed to have higher levels of verbal and numeric abilities because they are explicitly selected for admission on the basis of these cognitive abilities. However, it is less clear whether they have higher levels of spatial ability because universities do not use this cognitive ability as an explicit selection criterion. If university students do indeed have generally higher levels of spatial ability, then the absence of those who score low on this ability would produce a reduced variance and, in turn, attenuate the estimates of the correlation in the general population [70]. Thus, it is possible that use of a student sample underestimates the magnitude of the correlations found in this study. To overcome this sampling bias, future map comprehension studies should aim to recruit participants with a broader range of ages and abilities to be more representative of the population using these maps. With a more representative sample, we would expect larger correlations between variables. In practice, people may view hazard maps with a family member or friend, so future research could also include testing map comprehension in pairs or groups. Group discussion has been shown to improve reading comprehension [71] and may also improve map comprehension.
A second issue associated with this sample is that the students were not residents of the mapped area. This lack of familiarity with the area might have depressed map comprehension scores, especially for those with low spatial ability. To address this issue, future research on map comprehension should be conducted using samples of people who live in the mapped area.
A second study limitation arises from the type of map studied. Specifically, interactive hazard maps are fairly new, so this study is one of few investigating how people view and interpret dynamic map information. One consequence of the scarcity of prior studies on dynamic maps is that many of the recommendations used to update HM1 were made primarily for plan-form maps. It may be that people interpret maps differently when they are online versus in-print and that recommendations for one type do not apply well to the other. Thus, one future research objective should be to determine if providing the same hazard information on a plan-form and interactive map leads to comparable user comprehension levels.
A related issue is that, with the increasing use of interactive hazard maps, more research is needed on both single-and multi-hazard maps. Better understanding of how people navigate and use map features, how long they spend on the maps, and what kind of information they absorb are topics on which more research is needed. Assessment of the cognitive processes and cognitive abilities involved in map comprehension could also be expanded. More studies are needed to further identify which abilities predict map comprehension and how they are recruited in processing hazard maps [72].
The third limitation concerns whether the regression models are specified correctly. The available literature on map comprehension indicates that many, if not most, of the relevant variables have been included in the model, but the models in Tables 5 and 6 only account for ~4-14% of the variation in the dependent variables. This means either that the variables included need to be measured more reliably or that there are omitted variables that were not included in the analysis. The estimated reliabilities for SBSOD (α ¼ 0.89) and PSA (α ¼ 0.86) are quite satisfactory, but those for Basic Map Skill (α ¼ 0.54), Advanced Map Skill (α ¼ 0.52), and PVA (α ¼ 0.64) have ample room for improvement. With regard to omitted variables, it is possible that adding measures of numeric ability would improve the prediction of map comprehension. Further study is needed to test these variables and to identify additional predictors of map comprehension.

Conclusions
This study provides a practical test of whether hazard map design and content recommendations are necessary to improve user comprehension of risk. We found that a "best practices" interactive map provided no improvement over an original interactive map. This may be because the original interactive map scored higher on the rubric than many other interactive maps. Consequently, although HM1 might be as effective as the "best practices" map (HM2), other hazard maps may need to be improved to reach the same degree of comprehension. Thus, government agencies should design their interactive hazard maps for the public by addressing the rubric elements in Table 1.
As expected, objectively measured spatial ability is an important determinant of peoples' ability to interpret map information. Specifically, spatial scanning, as measured by the ETS Map Planning test, was a somewhat better predictor of both measures of map comprehension than was spatial orientation (Paper Folding) or spatial visualization (Cube Comparison). Unexpectedly, however, self-reported spatial ability does not significantly predict map comprehension and poorly predicts objectively measured spatial ability.
Many of the studies referenced above use individual perceptions of map objects and information to develop map recommendations. Our results suggest that more quantitative metrics may be better. Nonetheless, the regression analyses accounted for only a small portion of the variation in map comprehension. More research is needed to better assess the degree to which different factors contribute to high map comprehension levels.