1 Introduction

Gaining access to graphical information such as maps, graphs, and diagrams represents a longstanding challenge for blind and visually-impaired (BVI) people. Efforts to address this challenge can be traced back for centuries beginning with raised tangible graphics [1, 2]. Many approaches have endeavored to provide non-visual access to graphical information, with techniques ranging from simple paper-based tactile graphics to complex refreshable haptic displays (see [35] for detailed reviews). However, the majority of these approaches have not made significant inroads in reaching blind end-users because of a number of factors, including that they are static, expensive, have limited portability, are cumbersome to author, and require a steep learning curve to master [6, 7]. With the recent advancement in touchscreen technologies, we have estimated (based on informal surveys of participants in our lab and through discussion at blindness-related social/advocacy organizations) that 70–80 % of BVI people who use a cell phone are using touchscreen-based smartphones. As a result, many researchers and developers are employing touchscreen devices to provide BVI users with access to graphical information by capitalizing on the device’s built-in vibration and auditory features [810]. Touchscreen-based solutions suffer from unique challenges due to the perceptual limitation imposed by both haptic perception (e.g., low haptic resolution and lack of cutaneous information) and the hardware (e.g., a featureless glass surface display with limited screen real estate). Our previous work addressed these limitations through development of a touchscreen-based solution, called a Vibro-Audio Interface (VAI) and demonstrated that the VAI is a viable multimodal solution for learning various formats of graphical information such as graphs, polygons, and maps [7, 1113]. Studies have also shown that simulating haptic feedback on touchscreen interfaces using additional hardware such as mounting vibro-tactors on the fingers [14, 15, 17] and using electro-static screen overlays [16, 18] can be used as a potential solution for providing BVI people with non-visual access to graphical information.

Although promising, a major limitation with the VAI, and all other touchscreen-based solutions, is that the underlying devices have a limited screen real estate (ranging from ~ 3 to 18 in.), which constrains the amount of graphical information that can be simultaneously presented. A common method for presenting complex large-format graphical information (such as maps) is to group information based on its spatial proximity and relevance, and then to allow users to access this grouped information at different spatial and temporal intervals. These intervals are usually termed as zoom levels and the process of navigating between these zoom levels is termed as a zoom-in operation (navigating deeper into the rendering) or a zoom-out operation (navigating towards the top layer). For example, choosing Toronto as a location in Google maps will yield different information granularity based on its representation at different zoom levels (see Fig. 1). Zoom level 0 (lowest zoom level) will yield the overview of the globe, and as one zooms in to level 10, city names around Toronto will be accessible, and by further zooming in to level 17, finer (deeper) details such as street names within the city will become accessible. For the purpose of this paper, this scenario of navigating between different levels of information content is referred to as information-zooming.

Fig. 1.
figure 1

Google maps displaying Toronto at zoom levels 0, 7, and 18

Zooming operations are also used for magnifying (scaling-up) or shrinking (scaling-down) the graphical renderings without affecting their topology. In general, magnifying is termed as a zoom-in operation, and shrinking is termed as a zoom-out operation. For the purpose of this paper, this alternative zooming scenario is referred to as image-zooming. Image-zooming does not involve navigating between different levels of information content, as is the case with information-zooming; instead, a single level of information content is either magnified or shrunk based on a fixed-step or a variable-step scale factor. With both these scenarios, it is essential for the users to integrate the different pieces of graphical information conveyed across zoom levels into a consolidated whole, i.e., the cognitive map. This information integration is especially important for BVI users as challenges in non-visual environmental sensing often limit their access to information that is necessary for cognitive map development [6]. To overcome this issue it is vital that BVI people can access spatial products such as maps to learn environmental relations. However, for this to work with touchscreen devices, they must be able to accurately integrate information between different zoom levels.

Sighted individuals can intuitively perform this integration process by using various zooming techniques (e.g., pinch gestures) as rapid saccadic eye movements and a large field of view makes the top-down grouping of map information relatively easy [3, 19]. By contrast, touch-based non-visual interfaces cannot directly employ these techniques as haptic exploration is a slow, serial, and highly cognitively demanding process. In addition, finger-based gestures are the primary mode of accessing and learning the graphical elements, so cannot be simultaneously used for performing zooming operations [20]. Moreover, haptic exploration requires investigating the map using a contour following technique to determine whether to zoom in or out [21]. This process can be extremely inefficient and frustrating depending on the complexity and structure of the graphic [22, 23] and how the zoom levels are implemented [24]. To appreciate this challenge, the reader is invited to try learning a map using zooming operations with your eyes closed. To overcome this challenge you must learn graphical elements at each zoom level independently and then integrate the levels to build a comprehensive mental representation of the map. To ease this non-visual integration process, the information across levels (at least the adjacent levels) should have meaningful relations [24], and include prominent features (landmarks) such that users can easily relate and integrate the levels [25]. The question remains open as to how non-visual graphical elements can be best learned at each zoom level and then integrated to build a global cognitive map. To our knowledge, no work to date has addressed this issue. Our motivation is to fill this gap in the literature by experimentally evaluating whether users will be able to employ non-visual zooming operations to build a global cognitive map by integrating and relating graphical elements presented across multiple zoom levels.

2 Current Research

Many researchers have evaluated the use of traditional zooming techniques (known to work with visual displays) with non-visual interfaces. Some notable work with haptic displays include the use of fixed-step zooming for enhancing or shrinking virtual graphical images at a fixed linear scale [26, 27], identifying the number of steps (zoom levels) for optimal handling in haptic zoomable interfaces [28], and the use of logarithmic step zooming that enhances or shrinks the graphical image via electronic haptic displays [29]. Studies have also made comparisons between zoom levels with audio-tactile map exploration [10], and studied zooming operations using auditory cues to learn virtual environments [30]. While each of these projects demonstrated the users’ ability to perform traditional zooming techniques to achieve a particular task, they did not evaluate whether users were able to learn and develop a cognitive map of the given graphical material, which is our focus here. Another major limitation of these studies is that they did not address information-zooming scenarios; rather, evaluations were only made with image-zooming applications, which results in graphical elements being cropped or partially displayed while zooming from one level to another. Work by [31] addressed these issues with image-zooming scenarios and found that use of — what is called — an intuitive zooming approach was more effective than the traditional fixed step-zoom approach [21] and that maintaining meaningful relations across zoom levels benefits tactile learning [24]. However, these studies did not evaluate whether users were able to develop a global cognitive map of the graphical renderings being apprehended. For a zooming method (or algorithm) to be truly useful, in addition to being intuitive and robust, it should also support exploration and integration of graphical elements across different zoom levels. With this logic, a behavioral study was conducted to evaluate whether non-visual zooming methods support users in development of cognitive maps by facilitating non-visual exploration, and integration of graphical elements across multiple zoom levels.

3 Experimental Evaluation

This study extends the use of our Vibro-Audio Interface for investigating graphical access and map reading as evaluated in previous work [7, 12] by employing complex large-format graphical materials such that the use of zooming or panning operations is necessary to perceive the layouts in their entirety. Twelve blindfolded-sighted participants (five males and seven females, ages 19-30) were recruited for the study. All gave informed consent and were paid for their participation. The study was approved by the Institutional Review Board (IRB) of the University of Maine. It is important to note that use of blindfolded-sighted participants is justifiable here as we are testing the ability to learn and represent non-visual material that is equally accessible to both sighted and BVI groups. In support, an earlier study with the VAI found no differences between blindfolded-sighted and BVI participants [7]. Indeed, inclusion of blindfolded-sighted participants is generally accepted as a normal first step in the preliminary testing of assistive technology (see [32] for discussion).

3.1 Experimental Conditions

Three different zoom-mode conditions were compared as part of this study, namely: (1) Fixed zoom, (2) Functional zoom, and (3) No zoom (control).

Fixed zoom. This method is where a single level of information content is either magnified (zoomed-in) or shrunk (zoomed-out) based on a fixed-step scale factor. The advantage of adopting a fixed step zoom methodology is the redundancy of the graphical elements across zoom levels, which facilitates the integration process as users can relate graphical elements by maintaining references (i.e., landmarks) between the zoom levels. Since touchscreens have a limited screen real estate (i.e., viewport), information content will inevitably extend beyond the viewport as one zooms in using this method. To facilitate access to the information beyond what is directly perceivable on the display, it is necessary to incorporate panning operations. From earlier work with the VAI, we found that a technique called two-finger drag was an intuitive and efficient approach for performing non-visual panning [12]. With this technique, users can explore and learn graphical elements with their primary finger and when panning is necessary they initiate or stop the panning-mode by placing or removing a second finger on the screen. Once in panning mode, users could pan the map in any direction by dragging it with the two fingers synchronously. The advantage of this panning method is that users can stay oriented and maintain their reference as their primary finger is constantly in contact with the map (Fig. 2).

Fig. 2.
figure 2

Fixed zoom demo (left), functional zoom demo (right)

Functional zoom.

The functional zoom method implemented here is adopted from what is termed “intuitive zooming” implemented on a tactile mouse-based display [24, 31], where the different zoom levels are based on an object hierarchy (see [24] for details). This zooming algorithm involves two rules: (1) objects that are close to each other are considered as meaningful groupings and are selected as a whole to be represented in a sub-graphic; otherwise, (2) individual objects are represented in each sub-graphic. By adopting this algorithm the redundancy of graphical elements across zoom levels was avoided in this condition. For instance, zooming in to level 2 with functional zoom will only show the corridor segments and landmarks. Whereas level 2 in the fixed zoom condition will show the corridor segments and landmarks along with the boundary (redundant from level 1). While the spatial relations between the two levels are explicit in the fixed zoom condition, they must be interpreted between levels with the functional zoom technique. The redundancy was purposefully avoided in this condition to assess whether users would be able to interpret these spatial relations when they are not explicitly specified.

No-zoom control condition.

To assess the influence of using zooming operations and their effect on spatial and temporal integration of graphical elements across zoom levels, a no-zoom condition was included as a control. In this condition, the entire indoor layout was presented to the user at a single zoom level using the VAI. To facilitate access to the complete map, which extended beyond the contours of the display, the two-finger drag panning method was also incorporated into this condition.

3.2 Experimental Stimuli and Apparatus

For all three conditions, the Vibro-Audio Interface was implemented on a Samsung Galaxy Tab 7.0 Plus tablet, with a 17.78 cm (7.0 in.) touchscreen serving as the information display. Three building layout maps were used as experimental stimuli (with two additional maps used for practice). Each map was composed of corridors, landmarks, and junctions. Each map had three levels of information: (1) a layer containing the exterior wall structure of the building, (2) a layer showing the corridor structure with position of important landmarks indicated, and (3) a layer showing landmarks (e.g., an Exit). The three maps were carefully designed such that they had the same complexity but different topology. Each map required the user to zoom into each of the three different levels (and/or to pan in all four directions) in order to access the map in its entirety. The complexity was matched in terms of: (1) Boundary structure, (2) Number and orientation of corridor segments, (3) Number of junctions, and (4) Landmarks. Each of the maps had 3 landmarks with names based on a standard building layout theme, e.g. entrance, exit, and rest room.

The maps were all rendered using previously established vibro-tactile parameters (7). Line widths of 8.9 mm (0.35 in.) were used, which corresponded to 60 pixels on the 7.0 in. touchscreen. The vibration feedback was incorporated using Immersion Corp’s universal haptic layer (UHL) (Immersion, 2013). The exterior walls were given a constant vibration, based on the UHL effect “Engine1_100” which uses a repeating loop at 250 Hz with 100 % power. A pulsing vibration based on the UHL effect “Weapon_1” (a wide-band 0.01 s pulse with a 50 % duty cycle and a 0.02 s period) indicated the junctions. The corridors were rendered with a fast pulsing vibration based on the UHL effect “Engine3_100” which uses a repeating loop at 143 Hz with 100 % power. The landmarks were indicated by an auditory cue (sine tone) coupled with a fast pulsing vibration, based on the UHL effect “Engine3_100”. In addition, speech output (e.g., name of the landmark) was provided for the junctions and landmarks upon tapping the vibrating region. Similarly, the zoom levels were indicated by speech output. For example, zooming-in to level 2 from level 1 was indicated by a speech message “at corridor level”.

3.3 Procedure

A within-subjects design was used in the experiment. In each condition, participants learned a building layout map and performed subsequent testing tasks. The condition orders were counterbalanced and individual maps randomized between participants. The study consisted of a practice, learning, and testing phase for each condition. The first practice trial in each condition was a demo trial where the experimenter explained the zooming technique, task, goal, and strategies. The participant explored the stimuli with corrective feedback from the experimenter. In the second practice trial, they were blindfolded and were asked to learn the complete map, and perform a test sequence without a blindfold. The experimenter evaluated the practice test results immediately to ensure they correctly understood the tasks before moving on to the learning phase. During the learning phase, participants were first blindfolded and were instructed to explore and learn the map. While learning, they were allowed to switch back and forth between the zoom levels without restriction. They were asked to indicate to the experimenter when they believed that they had learned the entire map. Once indicated, the experimenter removed the device and proceeded to the testing phase, which consisted of three tasks: (1) landmark positioning, (2) inter-landmark pointing, and (3) map reconstruction.

In the landmark positioning task, blindfolded participants were asked to mark the position of a landmark by either zooming-in or zooming-out from one level to another level. The task assessed the accuracy of participants’ mental representation of the physical map, as correct performance required them to infer the spatial relations from their cognitive map. As an example, “from the landmark level (level 3), zoom-out to the exterior wall level (level 1) and mark the position of “Exit” with reference to its position on the exterior wall of the building”. This task was excluded from the no-zoom control condition, as there was only one zoom-level. The inter-landmark pointing task assessed the accuracy of participants’ cognitive map by asking them to indicate the allocentric direction between landmarks using a physical pointer affixed to a wooden board. Since participants never learned the straight-line direction between landmarks, they could only perform this Euclidean task by inferring the spatial relations from their cognitive map. Three pointing trials were tested for each map (e.g., indicate the direction from the entrance to the restroom) covering all three pairs of landmarks. Finally, in the reconstruction task, participants drew the map and labeled the landmarks on a template canvas of the same size as the original map. To provide them with a reference frame of the map’s scale, the reconstruction canvas was matched with the device screen size. From this design, seven experimental measures were evaluated as a function of zoom-mode condition.

3.4 Experimental Measures

Learning time.

The learning time represents the level of cognitive effort imposed on the user while learning the map with each zoom-mode condition. The Learning time is the time taken from the moment they first touched the screen until they confirmed that they had completed learning of the map. The learning time ranged from ~ 1.5 min to ~ 11 min (Mean = 304 s, SD = 153 s).

Positioning accuracy.

As discussed earlier, participants were asked to mark the position of landmarks from one zoom level onto another zoom level and accuracy was measured by comparing the marked position to its actual position. The no-zoom control condition was excluded for this measure, as there were no zoom levels.

Positioning Time.

For the three positioning tasks, the time taken to identify a landmark using zooming operations was measured. Similar to positioning accuracy, this time was only compared between the functional zoom and fixed zoom conditions.

Pointing accuracy.

Angular errors were measured by comparing the reproduced angle to the actual angle between landmarks and were then analyzed in two ways: Unsigned (absolute) error and Signed (relative) error (under-estimation represents a negative bias and over-estimation represents a positive bias).

Reconstruction accuracy.

The reconstructed maps were analyzed in terms of whether the maps reflected the correct spatial configuration of the exterior walls and corridor segments using a bi-dimensional regression method [33]. Thirteen anchor points (3 landmarks and 10 junctions) were chosen on each reconstructed map. Their degree of correspondence with the actual map were analyzed based on three factors: (1) Scale (i.e., magnitude of expansion or contraction), (2) Theta (i.e., rotation), and (3) Distortion Index (i.e., overall difference considering both scale and theta).

Landmark labeling accuracy.

Landmark labeling accuracy was measured from the reconstructed map and discrete scoring was applied based on their correctness, ranging from 0 to 3 (i.e., 1 for each correct label, 3 if all three labels were correct).

Subjective preference.

Participants were asked to order the three conditions based on their preference (with one being most preferred). These data were analyzed to gauge what conditions were most liked.

4 Results

Performance data for each of the measures described above were analyzed and compared between the three zoom-mode conditions using a set of repeated measures ANOVAs and post-hoc paired sample t-tests. The f, t and p values of these analyses are given in Table 1, along with significant group comparisons. Overall results demonstrated that there were no significant differences between conditions across all measures tested except for learning time.

Table 1. ANOVA and paired sample t-tests results along with significant group comparisons

Learning time.

Results showed that participants took less time to learn using the functional zoom condition (M = 222.33 s, SD = 99.35) as compared to the fixed zoom condition (M = 335.83 s, SD = 145.95) or the no-zoom condition (M = 368.92 s, SD = 175.47). Learning time with functional zoom was significantly faster than the other two conditions (ps < 0.05), which demonstrates the intuitiveness of this method. However, no significant differences were observed between the fixed and no-zoom conditions. This is interesting because participants performed both zooming and panning operations in the fixed zoom condition, whereas they performed only panning in the no-zooming condition. This suggests that incorporation of a zooming operation did not impose any additional measurable cognitive load on the learning process.

Positioning Task.

Results showed that there was no significant difference between fixed and functional zoom conditions for either the relative positioning accuracy or the positioning time. This similarity is an important outcome, as fixed zoom (M = 27.55 s, SD = 6.6) was expected to perform better than the functional zoom (M = 25.44 s, SD = 5.11) as it had the advantage of providing a clear reference between corridors and landmarks at a single zoom level. The similarity of performance demonstrates that participants were able to accurately relate graphical elements and reference them even when presented independently across zoom levels.

Pointing Task.

No significant differences (all ps > 0.05) were observed between the three zoom-mode conditions in pointing accuracy (for both signed & unsigned error), indicating that learning from all three conditions led to the development of a similar cognitive map. This outcome also suggests that none of the conditions led to reliably different cognitive biases (signed error) in the mental representation of the map.

Reconstruction Task.

Results of the bi-dimensional regression analysis also revealed no significant differences between zoom-mode conditions (all ps > 0.05) for the three factors evaluated: Scale, Theta and Distortion Index. A numerical difference in the scale factor suggests that participants generally perceived the map to be of smaller size when apprehended from the zooming conditions (Fixed M = 0.983, SD = 0.15, Functional M = 0.975, SD = 0.11) but not when perceived from the no-zoom control condition (M = 1.05, SD = 0.16). This is likely because when they started learning the map, the rendering was within the display frame for zooming conditions (level 1) but extended beyond the frame in the no-zoom control condition. This difference might have created an illusion that maps rendered in this condition were bigger than the maps in the zooming conditions. However, this difference was not statistically significant, so should be taken with a grain of salt. In addition, overall performance with the theta and DI factors suggested that all three conditions led to the development of a similar cognitive map.

Subjective preference.

Participants preferred the zooming conditions over the no-zoom condition, with an equal level of preference for the two zooming conditions. We attribute this outcome to the fact that graphical elements were less cluttered in the zooming conditions as compared to the no-zoom condition. In parallel with our interpretation, seven (out of twelve) participants self reported that it was easier to learn graphical elements as groups (zoom levels) rather than learning them all at once.

5 General Discussion

It is necessary to incorporate zooming operations for accessing complex graphical information such as maps within the limited screen real estate of touchscreen devices. However, implementing and performing zooming techniques with non-visual interfaces is difficult owing to the perceptual and cognitive challenges underlying touch-based exploration. Furthermore, in situations with complex graphical information such as maps, the various graphical elements are often disconnected and rendered across multiple zoom levels. This paper addressed these challenges and investigated whether users could perform non-visual zooming operations and subsequently integrate graphical elements across zoom levels to form a globally coherent spatial representation in memory. A usability study was conducted to assess users ability to perform two different zooming methods when learning indoor layout maps by integrating map information presented across three zoom levels. Accuracy in cognitive map development was evaluated by comparing performance in exploration, learning, and spatial behaviors between three zoom mode conditions, namely: (1) fixed-step zoom, (2) functional zoom, and (3) a no-zoom control condition. All six measures tested required development and accessing of cognitive maps to infer various spatial relations between graphical elements. We postulated that if users were able to effectively integrate and relate graphical elements across zoom levels in the zooming conditions, the resulting cognitive maps should be similar to the one developed from the no-zoom control condition.

The most important outcome of the study is the similarity of performance observed across pointing, positioning, and map reconstruction tasks between the three zoom-mode conditions, demonstrating that all three conditions led to development of a similar cognitive map. It is important to note that the performance with the no-zoom control condition was not negatively influenced by the incorporation of a panning operation as the fixed zoom condition also incorporated panning and exhibited similar (even numerically better) performance. Furthermore, the error performance found here across all conditions is consistent with previous work on touchscreen-based graphical access [7, 1114], suggesting that learning from all three zoom-mode conditions led to development of an accurate cognitive map.

Learning time with functional zoom was significantly faster than the other two conditions. Although not statistically different, the overall trend of the data suggests that the functional zoom technique exhibited superior performance in all measures tested. This superior performance for functional zoom demonstrates that users were able to integrate and relate graphical elements even when the inter-element relations are not explicit. These findings are congruent with earlier work on intuitive-zooming [24] and suggest that avoiding redundancy and maintaining meaningful graphical relations between adjacent zoom levels is critical for effective non-visual exploration and learning of complex graphical information using zooming operations.

One limitation of the current study procedure was that the design did not require participants to perform zoom-out operations during the learning phase and only one of the three positioning trials during testing involved zoom-out operations. Although the positioning performance did not significantly differ between the three trials, enforcing equal use of zoom-in and zoom-out operations during learning would likely strengthen users’ ability to integrate graphical elements. Future work should address this limitation and compare more than 3 zoom levels to generalize the findings, which would also be more representative of complex graphical materials.

In conclusion, incorporating zooming and panning operations are an important first step in making digital graphics accessible to BVI people using touchscreen interfaces. We contribute to this effort by demonstrating that touch-based non-visual zooming operations support learning of complex large-scale graphical materials and aid the development of accurate cognitive maps. As discussed earlier, a major challenge in providing traditional tactile map access to BVI users is the size of the physical maps and the expense and effort of authoring these products for non-visual access. We believe touchscreen-based multimodal interfaces are a viable solution to overcome these challenges. Providing map access via such interfaces could have significant broader impacts on increasing BVI independence by offering a new tool to promote environmental learning and wayfinding behavior.