1 Introduction

In the last twenty years, after the visionary ideas of Weiser [1] about ubiquitous computing, a lot of research contributions have been implemented and transformed in real products, deployed and used outside of controlled environments, such as laboratories or ad hoc experimental test beds. This led to augmented environments to interact with, and to related research issues [28]. Moreover, technological innovations have allowed for installations of interactive displays in private, semi-public and (more interesting for us) public places, like fairs, shop windows, malls, workspaces, and public institutions. Interactivity is usually implemented by equipping displays with touchscreens, whereas cameras are less often used. This means that new technologies, e.g. Kinect-like devices [2] and the related features, are still not commonly adopted, especially in integration with public displays. As a consequence, interactive displays often don’t exploit all the interaction possibilities provided by cutting-edge technologies available today. Nowadays we have the opportunity to integrate Kinect-like devices, as well as other new technologies, in artifacts like public displays in order to explore touchless gestural interactions.

By using novel interaction techniques like touchless gestures, we can design for new scenarios: users will interact with wall-sized displays, and independently from their abilities (e.g. impairments, such as wheelchairs) they will be able to use gestures to get information from a public display. However, the main problem to be solved is to find a valid design methodology for gestures and interaction modalities. In order to make a product deployable in a wide range of social settings, studies cannot be conducted only in controlled environments, but rather they should take place directly “in-the-wild” (i.e. in appropriate social contexts where public displays are typically deployed). This is one of the few ways in which users’ behavior can be observed by taking into account different audiences and contexts, and how they influence users’ attitudes.

Moreover, different cultural backgrounds can affect the gestures used in order to achieve a particular goal. Therefore, a useful and robust gesture set for interaction with public displays should be also cultural-resilient. At last, gesture should be guessable to be easy to use for a wide audience. For these reasons, user-centered design appears to be the natural choice to define and test a proper method to obtain touchless gesture sets for public displays: gestures must be chosen by observing those adopted by users for various tasks and in different social settings. These observations and the subsequent selection of gestures can produce a cultural-resilient and guessable gesture set, as well as tips to improve the public display itself.

In the following sections of this paper, we will explain the main challenges to be faced with in order to develop a design methodology for touchless gestural interactions for public displays in-the-wild. These challenges will be explained in the next section. In Sect. 3 we suggest some solution to be tackled in order to solve the issues explained in Sect. 2. Finally, we will draw the conclusions in terms of indications to set up useful test beds for the design of suitable and resilient gesture sets.

2 Main Challenges

In this section we will describe the main challenges to be taken into account to define a robust methodology for designing touchless gestural interactions. Many works have been published in the area of gestural interaction design, and up to our knowledge, the majority of them were focused on touch-based interactions. However, some approaches can be inspiring for our goal, and the main challenges will be presented using these works as a starting point.

2.1 Touchless Gestural Interactions with Public Displays

Our work is mainly focused on the definition of a methodology for the design of touchless gestural interaction with public displays. There are many authors who have investigated the adoption of some gesture sets in public contexts. Müller et al. [3] and Zhai et al. [4] have combined touchless gestures and touch interactions in order to create a multimodal interface for public displays. This approach is interesting because it represents a real transition from the more common touch interactions to the novel touchless ones. However, touchless-only examples are more rare in literature. Hardy et al. [5] have deployed an interactive public display in a busy foyer at Lancaster University in UK. By using a coarse gesture set, they analyze the behavior of passers-by, and how their attention level varies with respect to the displayed content and to the social settings. In their work, the authors figured out that users were more interested in interacting with the display when other people did it rather than when nobody was in front of the display. This result confirms what Brignull and Rogers defined as honeypot effect, which means “the progressive increase in the number of people in the immediate vicinity” of the public display [26]. Hardy et al. also noticed that gestures defined by experimenters were slightly different from ones actually used by passers-by. Moreover, the display position affected how users decided to interact with it: if they could see the display from distance, users had more time to decide if watch, glance or totally ignore its content.

Hardy et al.’s work highlights two of the main problems of public displays: (1) users’ attention is not simple to attract; (2) touchless gestures defined by experimenters are often unsuitable. In the following sections, we investigate how these issues may be solved.

2.2 User-Derived Gestural Interface

One of the most interesting approach for designing a gesture set is to base it on users’ preferences. This idea, that is strictly related to the user-centered design research area, has been interestingly put in practice by Wobbrock et al. [6] for developing a touch gesture set for surface computing. In their work, authors presented a method executed in a controlled environment. According to this method, users were able to see the effect of an action on the display (what they called referent) and to propose a gesture (or a symbol) that, in the users’ opinion, should be the most suitable to achieve the same effect. In this way, they collected, classified and selected 48 user-defined gestures. By comparing this set with another one made - before the experiments - by the authors (all experts in HCI and theoretically able to design suitable gestures), they discovered that only 60.9 % of their gestures matched with the user-defined ones.

This is only one application of the so-called gesture elicitation studies [7]. Such kind of approach has the advantage of capturing user preferences in a very direct way. However, it also has one main drawback: in order to maximize the results, it should be carried out in a real life environment, instead of a controlled environment. In fact, since users’ behavior (and their preferences) is affected by the social context in which they operate [5, 8], and in order to define a gesture set for public displays, a valid social context for eliciting gestures is a real place, outside of any kind of laboratory, in which public displays are typically deployed. The main challenge in using a touchless gesture elicitation methodology is then to be able to collect a wide amount of data from users’ behavior, minimizing any bias due to experimenters intervention. For this reason, we think that an implementation in-the-wild of gesture elicitation must be adopted.

2.3 Experiments In-the-Wild

In 2005, Sharp and Rehman coordinated the UbiApp Workshop [9], which had the aim to define new practices for application-led research in the area of ubiquitous computing. In particular, experts in this research field agreed on how to evaluate ubiquitous applications, “arguing that the only way to evaluate an application against the ideals of ubiquitous computing […] is through long-term deployment in the wild. […] Small-scale lab studies still have a place - everyone agreed that they’re very useful in the early stages of user-centered design”, but “once researchers have performed lab-scale trials, they […] should use this data to continue to design, deploy, and evaluate similar applications on a larger scale” [9]. In other words, a fundamental result of the UbiApp Workshop was the need to evaluate applications outside of controlled environments, i.e. in-the-wild. This is particularly important in evaluating applications for public displays, because of the strong difference between a laboratory and the real settings in which these systems are deployed.

The necessity to execute longitudinal studies in-the-wild has been underlined by various authors after 2005 [1012]. Ojala et al. [12] have been deployed various displays in public places, and published results from a three years-long study. During this period, they continuously observed behavioral changes in users, and collected new insights for improving display functionalities and contents. Such kind of findings demonstrate how important are longitudinal studies in order to follow users’ preferences. This is particularly true when studying public displays, because of their implicit “wild” nature.

In order to correctly acquire users’ preferences while interacting with public displays, the possible presence of researchers or experimenters should be taken into account. In real situations, users are not invited to interact by anyone, and they don’t know which is the interaction modality. The presence of an experimenter that ask users to interact and explain how to do it, allow him to collect much more data than a totally uncontrolled situation. However, the intervention of an external agent on the users’ behavior introduce a bias (what we refer as experimenter bias). Solutions to this problem fall in two categories: (1) allow experimenter intervention, and study the arising bias; (2) avoid experimenter intervention and keep the environment uncontrolled.

Johnson et al. [13], who participated in the activities to be evaluated with the users, investigated the first option. They derived several dimensions in which the role of researcher can be described, in terms of the abilities to facilitating or encouraging users, explaining the system, but also the level of authority and familiarity with participants, and the experimenter’s relationship with the research. They concluded that participating and building a friendship with users can improve knowledge about how they see the system or the prototype object of the study. Another class of approaches that require experimenter intervention are the ones in which users are asked to fill up questionnaires directly provided by the researcher during the experiment. In [14] questionnaires are provided to users before and after interactions, in order to evaluate their expectations and experience. In this way, the behavior is biased, but there is a margin to evaluate this bias, by comparing expectations (questionnaires provided before the experiment) with experience (questionnaires provided after the experiment). Furthermore, it is possible to collect much more data and it is relatively easy to analyze them based on the answers to the questionnaires.

According to [13, 14], allowing experimenter intervention provides researchers with several advantages. However, this approach inevitably introduces biases in users’ behavior. This is the main reason why avoid experimenter intervention should be the preferred option. With their three years-long deployment, Oyala et al. [12] have demonstrated how many information is available without the experimenter’s intervention. They explicitly assessed that laboratory, single-location and campus-wide deployment cannot capture location influence, and so it is important to reproduce experiments and evaluation in a wider area, and in different locations. They are still continuing to evaluate their public displays, trying to solve the users’ hesitancy to use technology in public (indeed, it is another finding which can be only discovered and studied with no experimenter intervention). Non-intrusive methods were used also by Messeter and Molenaar [15] in order to evaluate non-interactive ambient displays, basing all observation on data gathered from cameras and a Wizard-of-Oz approach to edit the displayed content. They also directly interviewed users, but only at the end of the experiment, when their intervention did not constitute a bias anymore. In [16], researchers evaluated gestures used to interact with a tabletop computer. They blended themselves in the crowd by using casual clothes, and collect data using cameras.

However, the main drawback of experiments in which researcher’s intervention is not allowed is the requirement of long-term studies, which usually implies high costs. This is probably the main reason why public displays are often not evaluated using this approach, but usually with explicit researcher’s intervention. The main challenge here is to find methods to reduce costs and time, or imitate the “experimenter blending in the crowd” [16] and the use of cameras. Such methods are not simple as they seem, because of ethical issues like privacy of users or the need to inform them before executing any personal information gathering.

2.4 Display and Interaction Blindness

Evaluations in-the-wild depend on the applications or systems to be studied, but some issues of this evaluation approach are clearly related to the specific research topic. In particular, when investigating interaction with public displays, some phenomena can strongly affect the level of difficulty in gathering enough information. One of these issues has been called display blindness by Müller et al. [17], and it - similarly to the banner blindness in web pages – causes users not to look at the displays because of their prejudice about the content, which is expected to be advertisement. They investigated which factors were related to this issue, and proposed possible solutions to the problem. According to Müller et al., factors that can mitigate display blindness are: colorfulness or attractiveness, amount of time the display is potentially visible to passers-by, and display size. However this problem is not simple to solve, and can require to apply some techniques from the persuasive computing area.

As noted by Ojala et al. in their longitudinal study, even when users notice the display they often do not interact with it “because they simply do not know that they can” [12]. This means that the interactivity of a public display is not intuitive as expected, and there is the need to explicitly entice the interactions. Interaction blindness (as this phenomenon has been called) was noticed also by other authors working on interactive public displays [5], and without any solutions it is impossible to experiment any kind of gesture elicitation method, even less if in-the-wild. Ojala et al. suggest that “one way to overcome interaction blindness and entice interaction is make the interface more natural. Proxemic interactions are emerging as a potential paradigm for realizing natural interfaces […], but our simple visual proxemic cue […] (the “Touch me!” animation) did not noticeably increase user interaction” [12]. Proxemic interactions were introduced by Ballendat et al. [18], and they are very related to (and actually based on) a previous work by Vogel and Balakrishnan [19]. In [17, 18] author propose systems that react on user’s position and orientation, i.e. without any implicit interaction. Such idea seems promising in solving interaction blindness, because users can easily see the interactivity of the display if its contents change in correspondence with users’ movements. Indeed, proxemic interactions allow the implementation of more sophisticated solutions than a simple “Touch me!” animation, and there is the need to better investigate how they can help to solve interaction blindness.

Moreover, proxemic interactions can help users to understand the features of an interactive public display, by modeling it as a sort of mirror (i.e. one of the four mental models proposed in [20]). The mirror mental model has been shown to have a strong potential to catch users’ attention [20, 21], which suggest to use it also as a partial solution to display blindness, in addition to interaction blindness.

2.5 Gesture Characteristics

By keeping in mind display and interaction blindness, an approach based on gesture elicitation in-the-wild should allow for the design of a gesture set that can capture users’ preferences and expectations. In order to study and validate an elicited gesture set, it is mandatory to know and define which characteristics a gesture should incorporate. In their work on developing gestural interfaces, Nielsen et al. [22] investigate some features of gesture vocabularies, by distinguishing technology based vocabularies and human based vocabularies. Gestures in the first set are easy to recognize technically, but they are often stressing, sometimes impossible to perform for some people, and illogical if compared with the functionality to activate. On the other hand, a human based vocabulary (often developed with a user-centered design technique) should be easy to perform and remember, intuitive (this feature is also known as guessability [6, 7]), metaphorically and iconically logical towards functionality and ergonomics [22].

Furthermore, gestures should be usable in the social context in which they will be performed. In other words, during the design of gestures, social acceptability [8] must be strongly taken into account. As stated by Rico and Brewster in [8], “with respect to gesture-based interfaces, users must evaluate if their motivation to use the technology outweighs the risk of looking strange or making a social blunder. In the face of such issues, gesture-based interfaces must be designed with an awareness of social context and social acceptability”. Investigating social acceptability of a gesture is implicitly included in any evaluation of public displays in-the-wild without any experimenter’s intervention. If users decide to interact by gestures, they implicitly categorize this interface as socially acceptable. With this in mind, the next challenge is the evaluation of gesture sets in a real social context, overcoming any experimenter’s intervention, but taking into account ethical and privacy issues, which can easily become a primary obstacle.

3 Designing a User-Derived Gestural Interface In-the-Wild

As stated in the introduction of this paper, the main problem we aim to tackle is to find a valid methodology for designing touchless gestural interfaces in-the-wild. To this end, in this section we briefly summarize different approaches that might be suitable to address problems and opportunities related to the previously described challenges.

3.1 User-Derived Touchless Gestural Interface

We believe that a Wizard-of-Oz – based methodology can be effective to study users’ interactions with public display. In particular, by means of a GUI (Graphical User Interface), Wizard-of-Oz approaches are tipically deployed in a two-step process: in the first one, users are implicitly suggested to interact with gestures, thus overcoming interaction blindness. Users could be quietly observed via cameras, and their gestures can be collected and used in real-time by experimenters to opportunely animate the GUI and making it reactive from the user’s point of view. After collecting data about gestures, these can be analyzed to define a basic gesture set. In a second step, the evaluation in-the-wild of the defined gesture set will take place to get users’ feedbacks and improve the set.

This idea has been inspired by a technique used by Good et al. [23] to develop a user-derived CLI (Command Line Interface). Basically, users had to interact with a console, using their words as commands, in order to write and send an email. The same approach could be extended to create a gesture set as above described, but its deployment is expected to be much more complicated to implement. On the other hand, this kind of approach is exactly what studies on gesture elicitation have exploited, although their experiments have been mainly deployed in controlled environments. Our goal is to extend the original idea proposed by Good et al. in the wild and without any experimenters intervention.

The main problem with an unsupervised and uncontrolled approach, is to catch users’ attention, and to convince them to interact with the display. In other words, display and interaction blindness still need to be overcome. However, we don’t focus on display blindness here. We suppose to reuse persuasive computing methods [27] to reduce the occurrence of this phenomenon. In the following section, we discuss some principles aimed to reduce interaction blindness and make gestural interactions more natural for novel users.

3.2 GUI for Touchless Gestural Interactions

In 1997, Andries van Dam introduced the notion of “post-WIMP user interfaces. These interfaces don’t use menus, forms, or toolbars, but rely on, for example, gesture and speech recognition for operand and operation specification” [24]. This notion became common in the area of GUI design, but hitherto there are no standards as strong as the WIMP paradigm. Only in the main mobile operating systems for smartphones and tablets (e.g. Android or iOS), we can see some coherent approaches in the main components of GUIs. However, touchless gestural interaction is different, and it requires new ideas.

The definition of a new model for a touchless gestural-oriented GUI for public displays should follow four principles:

  1. 1.

    the GUI should implicitly suggest its interactivity (i.e. implicitly avoid interaction blindness);

  2. 2.

    the GUI should implicitly suggest the touchless and gestural nature of the interactions;

  3. 3.

    the GUI should make interactions naturalFootnote 1 and guessable;

  4. 4.

    the GUI should minimize the effect of legacy bias [7] (i.e. GUI components should not have any relations with WIMP, leaving users free to guess the right gestures, without biases due to their previous experiences in using WIMP interfaces).

A possible solution to the first principle is the adoption of a proxemic interactive GUI, in which one or more components react to the implicit user movements. In this way, anyone moving in front of the GUI can guess its interactive nature. The fourth principle can also be implemented by suggesting new paradigms, removing typical WIMP signs (e.g. the “X” for closing windows) and components (e.g. windows and pointer) of GUIs. The main issue consists of complying with the second and third principles. Indeed their validity needs to be evaluated with suitable experiments, but some element can be used for any GUI prototype as a good starting point. A promising approach to suggest the touchless and gestural nature of the interactions, consists in using explicit tips in the interface. Furthermore, the affordance of some new technologies can be used to the same end (e.g. the presence of a visible Kinect-like device may suggest the interaction modality to anyone recognize the device).

The naturality and guessability of the interactions are more complicated topics. The naturality of an interface is strictly related to the meaning of natural interface, and can be evaluated by users but cannot be easily measured. On the other hand, guessability might be investigated with a long-term study, in which the trend of user gestures can be analyzed: gestures are guessable if many users use them repeatedly.

3.3 Ease of Use, Social Acceptability and Cultural Resiliency of Gestures

A Wizard-of-Oz approach might ensure that gestures are thought up by users. This means that their preferences are incorporated in the final gesture set. With the word “preferences”, we are not only talking about the personal strategy that each user prefers to adopt in order to interact with a display. Everyone can prefer certain gestures for a specific action, and it is impossible to uniform users’ preferences. However, as we stated in Sect. 2.5, there are some characteristics that should be shared in a gesture set: in addition to the previously cited guessability, gestures must be also socially acceptable and easy to perform (and use). Interestingly a Wizard-of-Oz approach implies that social acceptability and ease of use are part of the users’ preferences: users will unlikely choose to perform a gesture if they are not motivated to use it or if the gesture is amiss; furthermore, it is very unlikely that several users think up a common gesture that is difficult to use.

Another reason why the Wizard-of-Oz approach may be an interesting approach is the possibility to incorporate the way in which users’ preferences change in relation to different social contexts and users’ cultural backgrounds. Without an unsupervised experiment, in which users are not biased by researchers, it is more difficult to make a cultural-resilient gesture set. Instead, valid and interesting results can be collected by multiple deployments in multiple places, such as different countries and locations. In this way it could be possible to analyze data and infer cultural similarities in the used gestures. These similarities may become a strong knowledge base for researchers, that can be used in the selection of gestures to include in the final set.

4 Conclusions

In this paper we presented the main challenges in designing methodologies to implement and investigate touchless gestural interactions with public displays. By definition, these are deployed “in-the-wild”, i.e. outside controlled environments. This implies to study such interactions with novel approaches, in uncontrolled situations and without the explicit (or perceived) presence of experimenters. Factors such as social context, audience behavior and cultural differences can be primarily taken in account.

Furthermore, the novelty of touchless gestural interactions poses new challenges to the HCI community: the goal is to offer new instruments and paradigms, by which users can easily and naturally interact with the public display.

We proposed the use of a Wizard-of-Oz – based approach to collect users’ preferences on gestures to be adopted in order to interact with a public display. We propose to use a suitable GUI inspired by the four given principles to make it usable and to avoid phenomena such as interaction blindness. Following these ideas, we believe it will be possible to create a valid gesture set, that will be socially acceptable, easy to use and cultural resilient. The latter property can be achieved by deploying several Wizard-of-Oz experiments in different places, countries and social contexts. Moreover, we are convinced that experimenters intervention should be minimized as much as possible, to avoid biases in users’ behavior.