Task-dependent sketch maps

ABSTRACT Sketch maps are considered a reliable method for assessing spatial knowledge. However, it is unknown whether all information types in sketch maps are reconstructed with similar accuracy under differing task instructions. Here, we show that the dominant information type and metric accuracy of sketches of a single environment drawn by the same person change across drawing tasks. Results are in line with the situated nature of spatial cognition: participants produce different types of maps, and different recall errors, depending on the task at hand. Understanding variation in measures of sketch maps is critical for designing standardized and automated methods of sketch map analysis.


Introduction
Sketch maps are a widely used method of measuring the memory of larger environments.Compared to other methods, such as distance estimation or judgment of relative direction tasks, drawing a sketch map gives the participant relatively large freedom in choosing what to externalize from their memory and how to do it.This property makes sketch maps an appropriate research tool when researcher is interested in different strategies with which participants orient themselves specifically because sketch maps can capture diverse types of spatial knowledge (Appleyard, 1970).This is particularly helpful when one aims to provide route directions.People familiar with the city might vary in which detailed spatial information they remember (and how well), but they might be still able to successfully provide strangers with route directions using sketch maps that contain, emphasize, and schematize different things.This manuscript focuses on sketch maps containing routes.
Yet, it is unclear whether participants who have to communicate a single, well-known route, externalize information in the same way and with the same accuracy in different experimental situations.This paper focuses on two characteristics of sketch maps produced under varied experimental instructions by the same participants familiar with the single environment being drawn: sketch maps' type and sketch maps' accuracy.Understanding how experimental instructions affect the type of sketch maps can point to a methodological issue in the field because it would indicate that not all experiments or experimental conditions are reliably comparable to each other.Understanding how instructions affect sketch maps' accuracy may provide insight about the variability of human performance in the sketch mapping task.This could contribute to improving the standardization of sketch map analysis.

Reliability of sketch maps
The key assumption of using the sketch map task as a measure of spatial memory is that the sketches are externalized based on the underlying memory of the environment.This has been tested by Blades (1990) who showed that people produce very similar sketch maps if they are asked to do this in two trials separated by a week.This showed that sketch maps indeed represent what people remember about space and that sketch maps are a reliable method of accessing the cognitive representation of large environments.
Although sketch maps reproduced in the paper by Blades (1990) look similar to each other, their accuracy was not measured computationally.It is possible that sketch maps that look similar will result in different computational metrics of accuracy.Understanding the magnitude of this variability is an important goal because it would allow us to design standardized sketch map analysis tools that are not overly sensitive to natural human variation when one is tasked with drawing similar sketch maps multiple times.

Evidence for potential variability of sketch maps
At the time of Blades' study (Blades, 1990) the dominant belief was that spatial memory of environments (the "cognitive map") is stable, i.e., that people remember some aspects of the environment, do not remember others, and that they will consistently recall the same information unless they learn something new between the trials.Since then, this assumption has been challenged by the empirical evidence showing that human spatial cognition is situated: People recall different information and produce different error patterns depending on the task at hand (Tversky, 2009).This fact might affect the production of sketch maps because people might make different errors depending on the exact task with which the sketch map is being drawn.Such tasks are typically controlled by the experimental instruction given to participants.Therefore, people are likely to draw sketch maps of the same environment with different errors, when given different instructions.The open question is to what extent are different properties of sketch maps affected by this phenomenon.
An effect of task instructions on the process of recalling information is known in general psychology.Anderson and Pichert (1978) asked participants to memorize a story and to recall information from it twice.The second time participants were asked to recall the story, they were instructed to do it from the perspective of a different character.The authors demonstrated that people recalled different type of information following this shift in perspective.They suggested that different phrasing of instructions prompted participants to use a different recall strategy.
In the domain of spatial cognition, there is evidence that variations in task instructions can modify the production of route instructions, although this has been demonstrated for verbal descriptions, and not for sketch maps.Golding et al. (1996) showed that participants that were asked for route directions strived to establish common ground (shared level of knowledge) with the receiver of the instructions and that they aimed to address the receiver's goals.Hirtle et al. (2011) analyzed a corpus of route directions generated for different navigational activities.The authors demonstrated that routes provided for different activities vary by the type of spatial information they contain.Among the analyzed activities, Hirtle et al. (2011) considered "getting somewhere urgently" and "educational trip/sightseeing."Describing a route to sightseeing visitors was usually associated with providing a broader context about the space in which navigation will happen.The work by Hirtle et al. (2011) demonstrated that route directions might differ, depending on the receiver's purpose.
Findings from verbal route descriptions like these might be relevant to understanding the process of sketch mapping (Tversky & Lee, 1999).In experiments using sketch maps, it is customary to specify the sketch map's context, e.g. by asking to "draw a sketch for a friend visiting the city."The existing work from verbal route descriptions suggests that participants might draw different sketches depending on the familiarity and purpose of the receiver, although due to the scarcity of previous work it is difficult to hypothesize how these sketches would differ.In the current study, we manipulated two aspects of the drawing instructions: the receiver's familiarity with the area (familiar vs unfamiliar) and the receiver's purpose of the trip (urgent accident vs relaxed trip).
The work reviewed above indicates that sketch maps produced for different tasks might be of a different type but it did not analyze the accuracy of instructions.From the theoretical standpoint, it could be expected that sketch maps of the same environment produced multiple times by the same participant will demonstrate some variability in accuracy, regardless of the experimental instruction.This is because some metric relations may not be directly retrieved from memory.Humans do not store a stable mental representation of larger environments that would be equally accurate for all parts of the environment (Tversky, 1993).Some aspects of the environment are remembered better than others and are therefore more likely to be directly extracted from memory, when externalized.Metric information is especially prone to errors at recall and multiple theories proposed why.Ishikawa and Montello (2006) suggested that cognitive representations of larger environments are qualitatively metric, i.e., only contain approximate metric information but a good understanding of qualitative relations.Meilinger (2008) proposed the Network of Reference Frames Theory, according to which metric information is remembered for individual vista spaces but the relations between them are remembered in the form of a graph.Warren et al. (2017) extended these ideas to a model of a "labeled graph" mental representation.In it, topological connections between locations are stored as graph edges, while metric information is remembered as labels to edges (for distances) and as labels to nodes (for junction angles) of the graph.Remembered metric information therefore is not integrated into a globally consistent metric representation.It is possible to extract qualitative, as well as local metric relations directly from such a graph-based mental representation but this is not always possible for metric relations between distant locations.
Therefore, there is an incompatibility between human cognitive representations of large environments and the way in which they can be externalized when drawing a sketch map.Sketch maps enforce a globally consistent metric reference system: Participants are forced to provide metric relations between all elements in the sketch by drawing them somewhere on paper.If the metric information is only approximate in the underlying cognitive representation or only stored for local relations, then the metric relations between distant elements on sketch maps cannot be directly extracted from memory.Instead, they could be the result of combining multiple local (possibly each slightly erroneous) relations.The end result could differ in each instance of making a drawing because people routinely choose to make inferences over retrieving poorly memorized information (Benjamin, 2007).Due to the fact that qualitative relations are easier to extract from a graph memory structure, it can be expected that differences between multiple instances of sketch map drawings will be more pronunced in metric relations, but less so in qualitative relations.

Sketch maps' type and accuracy
One of the biggest challenges in using sketch maps for experimental research is the choice of the method with which to analyze them (Montello, 2016).Before sketch maps are interpreted, researchers must decide which type of information to extract from them, and how to do so consistently across many different sketches.The most popular approaches focused on extracting sketch maps' type and their accuracy.
Within the former approach, sketch maps are classified into distinct categories based on the similarity in the type of features they depict (Appleyard, 1970).This method has now been formalized (Krukar et al., 2018(Krukar et al., , 2020) ) so that it can be applied consistently to different sketch maps, without the need for creating new (experiment-dependent) sketch type categories.Within this method, each sketch map receives two scores, on a scale 0-6: one for its route-likeness, and one for its survey-likeness.Route-likeness refers to features that are drawn in order to communicate the route that the receiver should follow.A higher route-likeness score indicates that the sketch map contains a larger diversity of features that can be used to communicate which way one should follow in order to reach the destination.Survey-likeness refers to features that are drawn in order to communicate information about the spatial surroundings of the route.A higher survey-likeness score indicates that the drawing contains a larger diversity of features that can communicate configural spatial relations between objects located beyond the route.Such information can be used e.g., to plan shortcuts.Figure 1 presents examples of these two continua.
If participants draw task-dependent sketch maps, their sketches should differ in their type (i.e., result in different route-likeness or survey-likeness scores), regardless of the accuracy of information contained in them.In our study, different route-and survey-likeness scores also serve as a validation of the experimental procedure because they would show that participants indeed produce different sketches on each drawing occasion instead of mechanically reproducing their first sketch.
Sketch maps can also be analyzed with the focus on the accuracy of information depicted on them.Some of the methods available for that goal involve external judges grading "sketch map goodness" (Billinghurst & Weghorst, 1995;Krukar et al., 2020).Others suggested standardized ways to evaluate the correctness of qualitative relations between objects depicted in the sketch (Schwering et al., 2014).Probably the most widespread standardized method to evaluate the accuracy of spatial information in sketch maps is bidimensional regression (Friedman & Kohler, 2003;Tobler, 2010).It measures the distortion of the shape defined by the locations of landmarks depicted on the sketch map, compared to the shape defined by the locations of landmarks on a topographic map.One recognized disadvantage of this method when applied to sketch map analysis is that it is sensitive to all metric errors present on the sketch, regardless of their importance for the intended user of the sketch.Thus, when bidimensional regression differs across multiple sketch maps, this indicates that survey knowledge was extracted with different accuracy, but not necessarily that sketch maps are of different value for the task at hand.

The present study
The purpose of this paper is to test how the situated nature of spatial cognition affects sketch maps.Do people produce sketches of varying quality if asked for it repeatedly, under different instructions?And if yes, which instructional differences produce divergent sketch maps?We focus on sketch maps containing routes and serving as a mean of communicating route directions to others.
We manipulated the context of the sketch map by varying the experimental instruction provided to participants (the sketch map being produced for a receiver familiar/unfamiliar with the area, and for an urgent accident/relaxed trip scenario).We measured its effect on the produced sketches by evaluating the type of each sketch map, and its metric accuracy.
Based on the reviewed literature it can be expected that: Main Hypothesis: Sketch maps differ depending on the drawing instructions.
H1: Participants will produce different types of sketch maps, depending on the receiver's familiarity and purpose.We verify this by checking if the routelikeness scores (H1a) and survey-likeness scores (H1b) significantly differed across the experimental conditions.Since all maps produced in the experiments were for the purpose of providing navigational guidance to a person on a bicycle, we expected there would be no effect on route-likeness but a significant effect on survey-likeness.
H2: Participants will produce sketch maps with different levels of metric accuracy, depending on the receiver's familiarity and purpose.We verify this by checking whether mean accuracy of produced sketches was different in different experimental conditions (H2a) and in an analysis of within-subject variance by checking whether the variance of each participants' results differed from zero (H2b).

Open science statement
Our research questions and the main hypotheses were formulated before collecting the data but the analysis code was written afterward.Due to scarcity of previous work on the subject, we have made no specific predictions within Hypothesis 2a (i.e., whether accuracy would be larger or smaller for each given experimental condition).These results should be therefore treated as exploratory (Wagenmakers et al., 2012) and as a postdiction analysis (Nosek et al., 2018).An important contribution of the paper are the raw data and code that can be re-used to extend the reported analyses with other measures of sketch maps (see Data and Software Availability section).

Participants
We recruited 34 participants using the university mailing list but provided no financial or course credit compensation.One participant did not complete all tasks and one participant reported learning something new about the environment between the trials; their results were therefore excluded from the analysis.
The remaining 32 participants were aged between 19 and 29 (M = 22.8 years) and 18 of them were female.All participants were German native speakers, ride their bike through the city regularly and have been living in the city for at least one year at the moment of participating in the experiment (M = 4.8 years).Most participants were students but we did not allow students and members of the Institute for Geoinformatics to participate.
Power analysis.We conducted a power analysis using the simr R package (Green & MacLeod, 2016) for simulated (not observed) results of the metric accuracy analysis.Our sample size of 32 participants had a 97.6% power to detect a medium effect size (i.e., of 0.5 SD) in one variable, if to keep other effects at 0 and the model's residual standard deviation at 0.5.

Experimental design
The experiment followed a within-subject 2 × 2 design, manipulating the sketch map receiver's familiarity with the area (familiar vs unfamiliar) and the receiver's purpose of the trip (urgent accident vs relaxed trip).Each participant produced 4 sketch maps in total, all for the same environment and the same pair of start and destination locations.In order to prevent participants from mechanically reproducing their own sketches without attending to the instructions, the 4 drawing tasks were split into two experimental sessions and assigned to one of two possible route directions.Consequently, each participant attended 2 sessions.During each session, each participant drew 2 sketch maps for the same pair of start and destination locations: each with a different drawing instruction, and each in a different direction.The order of these tasks and the route direction were counterbalanced: Each task instruction was assigned equal amount of times in each route direction (Hospital to Ludgeri and Ludgeri to Hospital) and at each possible order (as 1st, 2nd, 3rd, or 4th).The order of task instructions was also counter-balanced within the sessions, so that each instruction was equally often presented as first and as second within a session.

Material
We created 2 × 2 = 4 route instructions (Table 1).Additionally, we created a reversed-direction version of each of them (not shown in the table).The environment and the shortest route between the two destinations (length of 2.7 km) is depicted in Figure 2.

Procedure
Participants were invited to a laboratory room either alone or in groups of two; each was seated at a separate desk facing opposite walls, without direct visual access to the other desk.Before beginning the experiment, they were asked to sign an Informed Consent Form, following the ethical clearance of the Institute's Ethics Committee.Participants were then shown photographs of two prominent sites in the city: the Ludgeri roundabout and the University Hospital and were asked whether they know the location of those places.If they confirmed (all did), they were introduced to the procedure of the experiment.The experimenter read out the instruction twice and additionally provided a written copy.Participants were told to include as much information as they consider necessary.They were given access to an unrestricted amount of DIN A4 paper, using an average of 1.85 paper sheets per sketch map.The time for each drawing was not limited.Participants were also free to choose the exact route between the start and destination locations.After they reported finishing, the experimenter read out the second instruction and participants drew the second sketch.At the end they were reminded to attend the second session exactly in a week's time.The second session always began by asking participants whether they learnt anything new about this environment since the previous session.If their answer was negative, the procedure was repeated for the remaining instructions.

Data analysis
We analyzed sketch maps' type to evaluate Hypothesis 1 and sketch maps' metric accuracy to evaluate Hypothesis 2. We used the method by Krukar et al. (2018) to analyze sketch maps' type, and the method of bidimensional regression (Friedman & Kohler, 2003;Tobler, 2010) implemented in the Gardony Map Drawing Analyzer (Gardony et al., 2016) to analyze their metric accuracy.We analyzed each sketch map's type using the route-likeness/survey-likeness classification developed by Krukar et al. (2018).Within this method, each sketch map receives two scores, on a scale 0-6: one for its route-likeness, and one for its survey-likeness.Route-likeness is determined based on the complexity of information relevant to communicating route-based directions.One point is scored if the sketch contains: (1) a continuous route, (2) identifiable turns, (3) side streets at decision points (min. 2 instances), (4) side streets outside decision points (min. 2 instances), ( 5) local landmarks at decision points, and (6) local landmarks outside decision points.Survey-likeness is determined based on the complexity of information relevant to communicating the broader context of space, beyond the route; this type of information supports global orientation and allows taking shortcuts.One point is scored if the sketch contains: (1) a global point-like landmark, (2) a global line-like landmark (e.g., a railway), (3) a global regional landmark, (4) fragment of a street network beyond the route, (5) containment hierarchy (e.g., landmarks marked inside regions), ( 6) explicitly marked spatial relations between two distant objects (e.g., an u-shaped street network with a building in its center, where the building is clearly marked between two otherwise opposite and disconnected streets).Note that this method does not consider the correctness of sketch maps but formalizes a traditional approach of analyzing sketch types (Appleyard, 1970).The evaluation of method's reliability was provided by Krukar et al. (2018).Following the discussion therein we introduced a threshold of two instances for criteria that otherwise would be fulfilled by all, or almost all sketch maps.
We analyzed metric accuracy of sketch maps using bidimensional regression (Friedman & Kohler, 2003;Tobler, 2010) implemented in the Gardony Map Drawing Analyzer (Gardony et al., 2016).We used the measure of r which reflects the degree of correspondence between the configuration of landmarks on the sketch and the configuration of the same landmarks on the metric map of the city.The value of r approaches 1 if the configurations are identical.This method reflects the accuracy of metric information in sketch maps because a high value of r depends on preserving correct relative distances and angles between landmarks.It is, however, unaffected by consistent scaling or rotation of the entire configuration.A sketch drawn upside-down and scaled-down, compared to the metric map, could receive a score of 1, if the relative distances and angles between the landmarks are preserved.In the analysis, we coded the location of all landmarks drawn by each participant, treating also the most often drawn junctions as landmarks (the method is more reliable when considering a larger number of objects).For landmarks drawn as polygons we manually approximated their centroid.For each sketch map, we determined its r value by comparing it to the metric map of the environment.Therefore, each sketch map received an r score between 0 and 1 which was determined by the correctness of the configuration of landmarks and junctions that the participant chose to include in their sketch map.Following the recommendations of Gardony et al. (2016) we did not analyze sketch maps that contained less than 8 landmarks.
In order to verify the hypotheses statistically, we tested whether the familiarity, purpose, or their interaction, had an effect on the route-likeness, surveylikeness, and bidimensional regression scores of sketch maps.We implemented the model-comparison method of hypothesis testing using the brms R package (Bürkner, 2017) which is based on Stan (Carpenter et al., 2017), within the framework of Bayesian mixed-effect models (McElreath, 2016).This method is robust to potential missing data -i.e., it does not require dropping the entire participant if one sketch map contained less than 8 landmarks and was therefore not scored.We implemented two mixed-effect "cumulative" family models (suitable for ordinal data) for route-likeness and survey-likeness scores, as well as one mixed-effect "gaussian" family model for bidimensional regression scores.All models included random slopes across familiarity, purpose, and their interaction, as well as random intercepts across participants.We implemented corresponding null models (intercept-only, with random intercepts across participants) and derived Bayes Factors for the comparison of each null model with the corresponding unrestricted model.The result of this comparison informs whether purpose, familiarity, or their interaction had an impact on sketch map scores.We interpret the values of Bayes Factors following the classification by Jeffreys (1961).
In order to check which specific conditions have statistically significant difference (i.e., for an equivalent of post-hoc tests), we calculated posterior means and checked whether the 95% Highest Posterior Density (HPD) interval of each difference between conditions excludes 0. We summarize these results in text but provide tables with numerical details of the models, as well as pairwise post-hoc comparisons in Appendix A.

Results
Four sketch maps drawn by a sample participant are depicted in Figure 3.The online repository accompanying the manuscript contains all sketch map scans (see Data and Software Availability section).

Sketch map type
Table 2 shows raw data, i.e. the number of sketch maps that received each of the scores (0-6) on both scales (route-likeness and survey-likeness) across four conditions.In general, route-likeness was high.Within the survey-likeness scale, it appears that sketch maps in the unfamiliar-relaxed condition scored higher on average.In the following analyses, we tested the statistical significance of these differences.

Route-likeness
We tested Hypothesis 1a by comparing the model explaining the effect of familiarity, purpose and their interaction on route-likeness score of sketch maps with a null model.There was strong evidence (BF = 1/18.81)against Hypothesis 1a.Route-likeness scores did not significantly differ across conditions.Figure 4 visualizes the posterior mean estimates plotted for simplicity on a continuous (not ordinal) scale.Please refer to Appendix A for ordinal-scale visualizations.

Survey-likeness
We tested Hypothesis 1b by comparing the model explaining the effect of familiarity, purpose, and their interaction on survey-likeness score of sketch maps with a null model.There was moderate evidence (BF = 3.45) in favor of Hypothesis 1b.Survey-likeness scores significantly differed across conditions.The 95%HPD of the estimated difference between the unfamiliar accident and the unfamiliar relaxed condition excluded zero and can be considered significant.Figure 5 visualizes the posterior mean estimates plotted for simplicity on a continuous (not ordinal) scale.Please refer to Appendix A for ordinalscale visualizations.

Metric accuracy
The r value was derived using Gardony Map Drawing Analyzer (Gardony et al., 2016) for 108 sketch maps.It ranged from 0.68 to 0.99 (M = 0.94, SD = 0.05).Table 3 presents the summary of raw results.As visible, the average accuracy of sketch maps was high, which is unsurprising given that the participants had to draw prominent landmarks and popular routes in an environment they know very well.
We have standardized and mean-centered the r variable in order to facilitate model fitting.We tested Hypothesis 2a by comparing the model explaining the effect of familiarity, purpose, and their interaction on r (standardized) score of sketch maps with a null model.There was very strong evidence (BF = 68.88) in favor of Hypothesis 2a.The 95%HPD of the estimated difference between the unfamiliar accident and the familiar accident condition excluded zero and can be considered significant.Figure 6 visualizes the posterior mean estimates.For post-hoc comparisons please refer to Appendix A. Differences between conditions in the r (standardized) score can be interpreted similarly to Cohen's d effect sizes.

Within-person variability
Hypothesis 2a could be confirmed even if only a small number of participants drew highly different sketch maps, while the rest drew similar sketch maps in all conditions.We were interested whether it can be said that most participants drew different sketch maps across conditions.For this reason, we performed an additional analysis of this data.Within Hypothesis 2b, we investigated whether participants created sketch maps with different metric errors by comparing the above model with another alternative.The alternative model contained no by-participant slopes for the effect of familiarity, purpose, and their interaction.This means that the alternative model assumed that each person might have a different metric accuracy, but that this accuracy of a single person would not significantly differ across conditions.The alternative model performed worse than the original one.
There was very strong evidence (BF = 80.11) in favor of Hypothesis 2b: single participants did draw differently accurate sketch maps in different conditions.

Discussion
Participants produced different sketch maps, depending on the drawing task instruction.This result confirms our main hypothesis and is in line with the situated view of spatial cognition.Participants externalize information differently when they are asked for it multiple times with different task instructions -even if the tasks refer to the same area in the same, well-known city.Sketch maps drawn under different experimental instructions were of a different type: They differed in their survey-likeness (H1b) but not their routelikeness (H1a).This is not surprising because of the route-focused drawing task given to the participants who were always asked to draw route maps.The fact that survey-likeness differed depending on the familiarity and purpose of the participant supports evidence from verbal route instructions that identified spatial context as a factor differentiating route instructions provided for different receivers (Hirtle et al., 2011).
Taken together, results tested within Hypothesis 2a and 2b jointly suggest that although the average metric accuracy was high, participants draw sketch maps of different accuracy when asked for it multiple times.As visible in Figure 6, the difference between unfamiliar and familiar conditions was approximately 0.5 on the standardized scale.This can be interpreted similarly to Cohen's d effect size which for the value of 0.5 is considered a medium effect size (Cohen, 1988).Note that high proficiency in such tasks might be associated with lower variability, so this effect is likely to be more pronounced if participants would have to draw areas they do not know that well.
This result may seem in contrast to the seminal study by Blades (1990), but Blades could not consider this issue.The analyses therein correlated the number of road names, landmarks, and road segments included in two sketches produced on two occasions by the same participants; as well as relied on independent judges asked to match anonymized sketch maps based on their presumed authorship.These analytical methods demonstrate that two sketches produced by a single participant are similar (or sufficiently distinct in drawing style from other participants' sketches).However, they do not demonstrate that they are equally good: The analyses do not consider the accuracy of the contained information.These could have varied across the trials, even if two sketches contained similar amount of information and looked alike thanks to individualized drawing style.It bears noting that the result presented in the current paper does not invalidate the claim that sketch maps are a reliable tool for extracting spatial memory.However, our result points to the need to understand naturally occurring variability in the accuracy of sketch maps.This is necessary especially for standardizing and automatizing sketch map analysis.At least two theoretical explanations of variability in accuracy are possible.First, participants may extract information from a stable mental representation of the environment, but due to factors such as drawing skills, may be unable to do it equally accurately when drawing sketch maps multiple times.This would indicate that the variability in performance is a measurement error of the sketch mapping task: intrinsic to the tool and avoidable by using other methods.Alternatively, participants might not hold a single, stable, globally consistent metric mental representation of the environment and instead extract information onto a sketch map by re-constructing distant metric relations (e.g., by combining erroneous local relations).This would result in different accuracy at each trial that would also be pronounced if to extract such information with different methods.Finally, a combination of two explanations is also possible (i.e., participants drawing multiple sketch maps do so with some measurement error, as well as from a task-dependent mental representations).The first explanation is the least likely given the evidence in the spatial memory domain (Warren et al., 2017).Future work should focus on disentangling especially the latter two possibilities in an effort of isolating the potential measurement error of the sketch mapping task.Moreover, the experimental instructions seem to have a systematic effect on variability in accuracy: drawing sketch maps for a receiver unfamiliar with the environment was associated with lower metric accuracy.This exploratory result is counter-intuitive because a receiver who does not know the environment well may in fact require a more accurate sketch map.However, when considering that these experimental conditions also significantly increased the survey-likeness of the sketches, it seems plausible that participants draw different (more survey-like) sketch maps even "at a cost" of their metric accuracy.The role of the sketch map is not to depict the reality as closely as possible, but to schematize it in a manner appropriate for the task (Tversky, 2002).The result signifies that providing orientation with respect to the broader environment is considered a valuable feature of navigational instructions.This is in-line with the suggestions made by Schwering et al. (2017), who claimed that staying oriented with respect to the broader environment is a desired feature of route instructions, and that human-inspired wayfinding systems should support it better.
Our results can also be interpreted in the context of psycholinguistic work on audience design (Clark & Murphy, 1982).If to consider sketch maps with route directions as a form of communication, their creation might be affected by similar factors to those that have been shown to influence language production.For example, when communicating information, people routinely overimpute own knowledge to others, assuming that the audience knows the same things that the receiver (Nickerson, 1999).This could potentially make sketch maps worse if task instructions mentioned factors that increase overimputation of own knowledge, such as shared past and experiences.Kingsbury (1968) found that people tend to give less detailed route directions to those who are identified as being local, rather than out-of-town.Our results do not fully confirm this line of explanation -sketch maps drawn for familiar participants in our experiment were more accurate.However, sketch maps drawn for an unfamiliar-relaxed receiver had higher survey-likeness.This can indicate that drawers modify sketch maps for unfamiliar receivers in a nonobvious way: by increasing the variability of information types (higher surveylikeness) but not the metric accuracy of its placement on the sketch.This further confirms that metric precision is not critical to the quality of a sketch map as a communication tool.Nickerson (1999) mentioned an interesting asymmetry in the process of knowledge imputation that can be relevant to how sketch maps are used in spatial cognition research: One can only impute to the receiver this knowledge that one already has.Since overimputation of knowledge might manifest itself as information being missed in the message, these messages might be a bad indicator of one's true knowledge.In the spatial cognition context, this would indicate that sketch maps should not be taken as a valid representation of the producer's entire spatial knowledge if overimputation was the case.Many important pieces of information (landmarks, regional boundaries, spatial relations) could have been assumed to be known by the receiver and therefore ignored in the sketching process.Therefore, instructions should be framed in a way that reduces the possibility of overimputation as much as possible, if the goal is to use sketch maps as an indicator of the drawer's spatial knowledge.What is currently unknown, is which spatial relations are generally assumed to be obvious (and therefore potentially overimputed to receivers) by the majority of participants.
The methodological implication of the current paper is that a meta-analysis of studies with sketch maps would require considering task instructions as a confounding variable.Although the majority of studies seem to apply some version of a task instructing participants to "Describe the route to a person unfamiliar with it," it remains uncertain how the sketch map drawing task is interpreted when the drawing instruction is less precise about its purpose or its potential receiver (e.g., "Draw the map of the environment").
Theory-wise, our findings are in line with spatial memory theories that propose a stable representation of qualitative relations, but a vague, approximate, or local-only memory of metric spatial information (Meilinger, 2008;Tversky, 1993;Warren et al., 2017).Since sketch maps in our experiment were drawn based on the mental representation of the same environment, the findings suggest that participants partially guess or approximate metric aspects of their sketches.Our results also confirm the situated nature of human spatial cognition.For example, Hölscher et al. (2011) showed that participants produce different route instructions for themselves and for another person.Depending on the context of the task, they either relied on graph-like relations or on perceptually accessible information along the route.The behavioral pattern observed in our data follows a similar principle: participants pick what to communicate, depending on the context of the task.As a result, they produce varying error patterns.

Limitations
The main contribution of the paper is to demonstrate that task instructions have an effect on the type and metric accuracy of sketch maps, but the comparisons between individual conditions should be treated as exploratory.One reason for this is that instructions did not fully isolated the familiarity/ purpose effects: the familiar-accident instruction included the mention of a sister that introduces a personal dimension to the instruction that was missing in other conditions (although note that this condition did not significantly differ from others in any way that would suggest that such a confounding factor played a role).Moreover, in our experiment, we did not time individual trials and therefore cannot exclude the possibility that the accident condition could be associated with sketching faster (e.g., in the context of higher survey-likeness of unfamiliar-relaxed sketch maps, compared to unfamiliar-accident ones).
Another limitation could be associated with the demand characteristic of our experimental design.It is possible that participants who were asked to draw multiple sketch maps assumed these sketch maps should not be identical.Our experiment partially controlled for that by collecting sketch maps on two sessions.It would be difficult for a participant attending Session 2 to remember exactly what they had drawn earlier.See Appendix B for an analysis split by the experimental session, task order, and familiarity with the city.

Conclusion
Participants produced different sketch maps, depending on the drawing task instructions.Sketch maps partially differed in their type (i.e., in survey-likeness but not in route-likeness) as well as in the accuracy of their metric information.
The first implication of this is that the exact drawing task matters.Most studies use the scenario of a "friend unfamiliar with the area visiting the city."We suggest that familiarity and purpose of the "visiting friend" should always be explicitly stated (both in the experimental instruction provided to the participants and in the subsequent communication of the experimental procedure in the publication).Based on the alternatives considered in our study, it is clear that familiarity and purpose affects the type and metric accuracy of sketch maps, but other details of the drawing instructions (not investigated in this paper) might matter as well.For this reason, it is important to communicate such details, and consider them when interpreting own experimental results against results obtained by other researchers (possibly using other instructions or instructions that were vague with regard to the sketch map's ultimate purpose).
The second implication is important for the design of tools and procedures for standardized and automated analysis of sketch maps.It seems that methods that rely on metric accuracy of sketch maps may carry over intrinsic variability of the task (i.e., human inability to accurately transform graphbased spatial knowledge into precise metric relations on a sketch map in a reliable and repeatable way) into their results.It might be misleading to conclude that two participants have different spatial knowledge if the difference between the metric accuracy of their sketch maps is within a certain range.Understanding the significance of metric errors in sketch maps ultimately requires relating between-subject variance to within-subject variance.Our study has demonstrated that this within-subject variance is significant.Future work should focus on isolating it and on developing more reliable analysis metrics.Effect of task order.Participants drew sketch maps in a counter-balanced order.Visual inspection shows no apparent linear effect of task order (sketch maps were not getting systematically better or worse with each repetition).Effect of familiarity with the city.Participants differed in the number of years they lived in the city.As visible in the graphs, participants who lived much longer in the city, tended to have higher bidimensional regression scores and higher route-likeness, but no higher survey-likeness.We did not investigate the statistical significance of these effects because it is not related to our initial Research Question.The method used in the main manuscript to evaluated the stated hypotheses accounts for individual differences between participants.

Figure 2 .
Figure 2. The map of the environment with the marked shortest cycling route between the Ludgeri roundabout (green pin on the right) and the University Hospital (red pin on the left).© OpenStreetMap contributors.Directions courtesy of FOSSGIS Routing Service.

Figure 6 .
Figure 6.Estimated posterior means of the bidimensional regression score.

Table 1 .
Drawing instructions used across experimental conditions.with a friend in a cafe next to the Ludgeri roundabout.Your friend is a doctor and moved to Muenster only a few days ago.While having a coffee your friend is suddenly called.A bad accident happened, and he needs to get back to the University Hospital as fast as possible.Unfortunately, he is not yet familiar with Muenster and hence asks you to draw a map of the route for him, so that he can ride his bike back to the University Hospital as fast as possible.unfamiliarrelaxedYou are meeting a friend in the cafeteria of the University Hospital.Your friend is a doctor who moved to Muenster only a few days ago.While having a coffee, he asks whether you could explain him how to cycle to a new cafe next to the Ludgeri roundabout.He is not yet familiar with Muenster and would like to get to know a relaxed bike route there.Thus, he asks you to draw a map of the route.familiaraccident You are sitting with an old friend in the cafeteria of the University Hospital.Your friend is a doctor working and living in Muenster for several years.While having a coffee your friend is suddenly called.A bad accident happened in a new cafe near the Ludgeri roundabout, whereby his sister was badly injured.Your friend needs to go there as fast as possible but does not know the cafe.He asks you to draw a map of the route to get from the University Hospital to the cafe by bike as fast as possible.familiarrelaxed You are meeting an old friend in a cafe next to the Ludgeri roundabout.Your friend is a doctor and has been living in Muenster for several years.While drinking a coffee you mention that you know a relaxed bike route from the cafe to the main entrance of the University Hospital.He would like to get to know the route you are talking about and asks you to draw a map of it.

Table 2 .
The number of sketchmaps that received each possible score (0-6) on route-likeness and survey-likeness scales.

Table 3 .
Range, mean, and standard error of the bidimensional regression (r) scores, across the conditions.

Table A3 .
Posterior mean, standard error, and 95% credible interval of the cumulative model explaining the survey-likeness scores.

Table A2 .
Estimated means and 95%HPD of the routelikness score, with contrasts between individual conditions.Contrast differences whose HPDs exclude 0 are reported as significant in the manuscript.

Table A4 .
Estimated means and 95%HPD of the surveylikeness score, with contrasts between individual conditions.Contrast differences whose HPDs exclude 0 are reported as significant in the manuscript.

Table A5 .
Posterior mean, standard error, and 95% credible interval of the model explaining the standardized r scores.

Table A6 .
Estimated means and 95%HPD of the metric accuracy measure r (standardized), with contrasts between individual conditions.Contrast differences whose HPDs exclude 0 are reported as significant in the manuscript.