The BEAMING Proxy: Towards Virtual Clones for Communication

Participants of virtual worlds and video games are often represented by animated avatars and telerobotics allows users to be remotely represented by physical robots. In many cases such avatars or robots can also be controlled by fully-automated agents. We present a conceptual framework for a communication proxy that unifies these two modes of communication. The users can communicate via their avatar or robotic representation, an autonomous agent can occasionally take control of the representation, or the user and the autonomous agent can share the control of the representation. The transition between modes is done seamlessly throughout a communication session, and many aspects of the representation can be transformed online, allowing for new types of human computer confluence. We describe the concept of the communication proxy that has been developed and explored within the European Union BEAMING project, and describe one of the studies involving the proxy, in which the experimenter was perceived as both a man and a woman simultaneously.


Introduction
What would it be like to have a digital clone that looks like you, behaves like you, and represents you in events that you cannot attend? What would it be to be perceived by others to be a man and a woman at the same time? These days we hold many personal and professional meetings that do not take place face to face, but are mediated through communication technologies, such as mobile phones, video conference systems, social networks, online virtual worlds, and more. This opens the possibility that our representation in the communication medium would be controlled by autonomous software rather than by ourselves. We are by now accustomed to simple automated responses such as voice answering machines and automated email replies. In this chapter we describe a communication proxy that takes this concept further into a more sophisticated representation of the user that can engage others in lengthy events, opening possibilities of novel human computer confluence scenarios.
The proxy is part of the BEAMING 2 project, which deals with the science and technology intended to give people a real sense of physically being in a remote location with other people, and vice versawithout actually traveling (Steed et al., 2012). We explore the concept of the proxy in the context of several applications: remotely attending a business meeting, remote rehearsal of a theater play, remote medical examination, and remote teaching. In such applications it is useful to have a digital extension of yourself that is able to assist you while communicating with others or replace you completely, either for short periods of a few seconds, due to some interruption, or even for a complete event. Our approach is to see the proxy as an extension of our own identity, and allow a seamless transition between user control and system control of the representation.
The communication proxy operates in three modes of control: if the proxy is in autonomous (foreground) mode, its purpose is to represent its owner in the best way possible. To that purpose, the proxy has to be based on models of the owner's appearance and behavior, and to be aware of its owner's goals and preferences. When the proxy owner is using the telepresence system himself then the proxy is in background mode; the proxy is idle but can record data and is ready to automatically take control of the avatar representation. In addition, the proxy can operate in mixed mode; in this case the owner controls some aspects of the communication and the proxy controls other aspects, both at the same time.
We are interested in the proxy not only as a technological challenge but also conceptually, as a possible future inhabitant of society. Proxies could be physical humanoid robots, but since we spend increasing amounts of our time in digital spheres they could also be virtual agents. Most of us can think of cases where they would have loved to have someone replace them in annoying, boring, or unpleasant events. However, in what contexts will proxies be socially acceptable? What are the legal and ethical implications? In this chapter we focus on these conceptual issues that came up from our work; for a more technical overview see (Friedman, Berkers, Salomon, & Tuchman, 2014).

Semi-autonomous virtual agents
The proxy is a specific type of an intelligent virtual agent. Such agents have been studied widely, mostly as autonomous entities (e.g., see (Prendinger & Ishizuka, 2004;Swartout et al., 2006). There has been a lot of research on believability (starting from (Bates, 1994) and expressiveness of virtual agents (e.g., (Pelachaud, 2005), multi-modal communication, and the role of nonverbal behavior in coordinating and carrying out communication (e.g., (Vinayagamoorthy et al., 2006).
Semi-autonomous avatars were first introduced by Cassell, Vilhjlmsson and colleagues (Cassell, Sullivan, Prevost, & Chrchill, 2000;Cassell & Vilhjálmsson, 1999;Vilhjlmsson & Cassell, 1998). Their system allowed users to communicate via text while their avatars automatically animated attention, salutations, turn taking, back-channel feedback, and facial expression. This is one of the main interest points in the shared control spectrum, in which the verbal communication is handled by the human and the nonverbal behavior is automated, but we will see additional motivations for shared control below.
Others have discussed and demonstrated semi-autonomous avatars, mostly addressing the same goal: automating non-verbal behavior. Penny et al. (Penny, Smith, Sengers, Bernhardt, & Schulte, 2001) describe a system incorporating avatars with varying levels of autonomy: Traces, a Virtual Reality system in which a user's body movements spawn avatars that gradually become more autonomous. Graesser et al. (Graesser, Chipman, Haynes, & Olney, 2005) presented an educational mixedinitiative intelligent virtual agent focusing on natural-language dialogue. Gillies et al. (Gillies, Ballin, Pan, & Dodgson, 2008) provide a review and some examples of semiautonomous avatars. They raise the issue of how the human communicates with his semi-autonomous avatar, and enumerate several approaches.

Digital extensions of ourselves
Science fiction themes often portray autonomous artificial humans (such as in the movie Bladerunner 3 ) or remote control of a virtual or physical representation (such as in the movie Surrogates 4 ). Some of these entities are reminiscent of our proxy; for example, in the book series Safeload the writer David Weber introduces the concept of a personality-integrated cybernetic avatar (commonly abbreviated to PICA): this is a safe and improved physical representation of yourself in the physical world, and you can also opt to "upload" your consciousness into it. Such concepts are penetrating popular culture; for instance, the recent appearance of digital Tupac at the Coachella festival in 2012, which provoked much discussion of the social implications and ethical use of digital technology (the original Tupac was a rap musician killed in 1996). Artstein et al. (Artstein et al., 2014) have implemented such an interactive virtual clone of a holocaust survivor, focusing on photorealistic replication.
Clarke (Clarke, 1994) introduced the concept of a digital persona, but this was a passive entity made of collected data, rather than an autonomous or semiautonomous agent. Today we see such virtual identities in online social networks, such as Facebook profiles, but these are passive representations. Ma et al. (Ma, Wen, Huang, & Huang, 2011) presented a manifesto of what they call a cyber-individual, which is "a comprehensive digital description of an individual human in the cyber world" (p. 31). This manifesto is generic and is presented in a high level of abstraction, and it is partially related to our more concrete concept of the proxy as described here. The notion that computers in general have become part of our identity in the broadest sense is being discussed by psychologists (Turkle, 1984(Turkle, , 1997 and philosophers (e.g., as in the extended mind theory (Clark & Chalmers, 1998)). In this project, we take a step in making these ideas concrete, towards merging the identity of the person with his avatar representation.

Semi-autonomous scenarios in robotic telepresence
Shared control was explored extensively in the context of teleoperation, in order for robots and devices to overcome the delay that is often introduced by telecommunication (Sheridan, 1992). There is a distinction among several modes of operation: i) master-slave: the robot is completely controlled by the operator, ii) supervisor-subordinate: the human makes a plan and the robot executes it, iii) partnerpartner: responsibility is more or less equally shared, iv) teacher-learner: the robot learns from a human operator, and v) fully autonomous. This taxonomy is geared towards teleoperation and task performance, whereas we are interested in the remote entity as a representation of the operator, mostly in social contexts. Thus, our taxonomy is somewhat different. Master-slave is equivalent to the background mode of the proxy, and the proxy can also learn during this mode (similar to the teacherlearner mode). In terms of controlling a representation there is little difference between the supervisor-subordinate mode and our proxy's foreground mode. Finally, we are left with the partner-partner mode, which we call the mixed mode proxy, and that is the territory that we wish to further explore.
Telerobotics has also been used for social communication. Paulos and Canny (Paulos & Canny, 2001) describe personal roving presence devices (PRoPs) that provide a physical mobile proxy, controllable over the Internet to provide teleembodiment. These only used teleoperation (or the master-slave mode). Venolia et al. (Venolia et al., 2010) have come up with the concept of the embodied social proxy and have evaluated it in a real-life setting. The goal was to allow remote workers to be more integrated with the main office. Their setup evolved to include an LCD screen, cameras and a microphoneall mounted on a mobile platform. The live communication included video conference and when away the display showed abstract information about the remote person: calendar, instant-messaging availability, and connectivity information. Thus, there is an overlap between this concept and our proxy in terms of motivation. However, the representation of the users when they are away is minimal and iconic, whereas our goal is to provide a rich representation in such cases. Additionally, this concept does not include a mixed mode of shared control. Telepresence robots have recently been made available for hire on a commercial basis by several companies.
Ong et al. (Ong, Seet, & Sim, 2008) describe a telerobotic system that allows seamless switching between foreground and background modes, but it is also geared at teleoperation and task performance rather than communication. More recently, there have been demonstrations of robotic telepresence with shared control: Lee et al. describe a semi-autonomous robot for communication (Lee, Toscano, Stiehl, & Breazeal, 2008), designed as a teddy bear. The robot is semi-autonomous in that it can direct the attention of the communicator using pointing gestures.

Virtual clones
There are many aspects of a person that we may wish to capture in the proxy: appearance, non-verbal behavior, personality, preferences, professional knowledge, social network of friends and acquaintances, and more. In designing the proxy we distinguish between two relatively independent goals: i) appearance: the proxy has to pass as a recognizable representation of its owner, and ii) goals: the proxy has to represent the interests of its owner.
Appearance is dealt with by the computer graphics and image processing communities; clearly, progress in these communities can lead to highly photorealistic communication proxies. Creating a 3D look-alike of a person is now possible with off-the-shelf products, and humanoid robotic look-alikes have also been demonstrated (Guizzo, 2010). Personal characteristics in other modalities can also be replicated with various degrees of success: sound, touch, and non-verbal communication style. We have shown previously how the social navigation style of the owner can be captured by the proxy using a behavioral-cloning approach (Friedman & Tuchman, 2011): we capture the navigation style of users in a virtual world when they are approaching a new user, and construct a behavioral model from this recorded data that may be used when the proxy is in foreground mode.
It is important that the proxy's "body language," mostly gestures and postures, will be similar to its owner's. In a relevant study, Ibister and Nass (Isbister & Nass, 2000) separated verbal from non-verbal cues in virtual characters: they exposed users to virtual humans whose verbal responses came from one person and whose non-verbal behavior came from another person. Consistent characters were preferred and had greater influence on people's behaviors.
We see digital proxies as an important part of the everyday life in the rest of the 21st century. We spend a lot of time in cyberspace, and virtual proxies are natural candidates. Our first-generation proxy inhabited the 3D virtual world SecondLife 5 (SL). It is based on our SL bot platform, which allows bots to perform useful tasks, such as carrying out automated survey interviews in the role of virtual research assistants (Béatrice . We have prepared the SL proxy for the co-author of this chapter to give a talk inside SL as part of a real-world conference ( Figure 1): a workshop on Teaching in Immersive Worlds, Ulster, Northern Ireland, in 2010. The appearance of the proxy was canceled on the day of the event due to audio problems in the conference venue, but a video illustrates the vision and concept 6 .

Modes of operation
The proxy works in several modes of operation along the control spectrum and is able to switch smoothly between modes during its operation.
Background mode (Figure 2a): When the user is in control of the avatar, the proxy is in background mode. During a communication session, the proxy owner may be distracted: have someone enter their office, receive a phone call, or they need a coffee break. The proxy is able to automatically detect these situations and proactively switch to foreground (or mixed) mode (e.g., when the owner is tracked and decides to leave the room). Alternatively, the owner can initiate the transition. During background mode, the owner's behavior is recorded; this is expected to be the main source for behavioral models for the proxy. Typically, we record the proxy owner's non-verbal behavior; the skeleton data is tagged with metadata regarding the context, and this data is used in mixed and foreground mode to allow the proxy to have its owner's "body language." Foreground mode (Figure 2b): When the owner is away, the proxy is in foreground mode. The proxy should know when to take control of the remote representation (i.e., switch to foreground mode), and when to release control back to the user (i.e., switch back to background mode). This "covering up" for the owner may be necessary for short interruptions of several seconds, or for the whole duration of events. If the proxy merely replaces the owner for a short while then its behavior can be very basic; for some applications it may be useful only to have the proxy display reactive body language. For longer sessions of communication it is highly useful for a proxy to be able to answer some informational questions on behalf of its owner, and to have some understanding of the owner's goals. Ideally, the proxy would be able to learn this information implicitly from observing the user (while in background mode), or from user data. For example, our proxy can have access to its owner's calendar and location (based on the owenr's smartphone location), and access to additional inputs can be conceived. (Figure 2c): When the proxy owner and the proxy both control the same communication channel at the same time, we say that the proxy is in mixed mode. In this case the owner would control some of the functionality and the proxy would control the rest. The challenge becomes to allow for a fluent experience to the owner.

Mixed mode
(c)

Figure 2: Schematic network diagrams of the proxy configured in three modes: (a) background, (b) foreground, and (c) mixed mode. The diagrams are deliberately abstracted for simplification. Full lines denote continuous data streams, and dotted lines denote discrete events. Different colors denote the different modalities.
The spectrum between user control and agent control is often described as in Figure 3. However, this view of the spectrum does not help us chart the interesting space in the middle of the spectrum. As mentioned in Section 2.3, the taxonomy from teleoperation is geared towards task performance, and therefore there are not many lessons to be learned about the possibilities of mixed mode when it comes to shared control of a representation. Therefore, we suggest a conceptualization based on functional modules. From a technical point of view, the proxy operates as a network of interconnected modules . In principle, the functionality of each module can be carried out by both the software agent or by the human. For example, the decision of what to say can be dictated either by a human or by a natural language generation component, and this is independent of voice generation (e.g., in principle we can have a text generated by a chatbot read out loud by the proxy owner). Thus, we suggest that the human-agent shared control spectrum can be defined by the set of possible configurations of modules; if the number of components is N then the number of shared control modes is 2 N .

"Better than being you"
Presence has been coined "being there" (Heeter, 1992). Our expectation is that when advanced telecommunication technologies are deployed they may even provide a "better-than-being-there" experience, at least under some circumstances and in some respects. Similarly, an interesting opportunity presents itself: the proxy can be used to represent the owner better than the owner would represent him-or herself. For example, you may consider a proxy that is based on your appearance with a beautifier transformation applied (Looks, Goertzel, & Pennachin, 2004). Analogically, we have demonstrated the possibility of a proxy that extends your vocabulary in foreign nonverbal gestures (Béatrice S Hasler, Salomon, Tuchman, Lev-Tov, & Friedman, 2013): The proxy can be configured to recognize that the owner has performed culturespecific gestures and replace these gestures, online, with the equivalent gestures in the target culture's vocabulary 7 . In another study we have used the proxy system's ability for automated generation of non-verbal communication to use imitation in order to increase intergroup empathy (B. S. Hasler, Hirschberger, Shani-Sherman, & Friedman, 2014).
One of the applications of the proxy is allowing a person to take part in multiple events at the same time. For example, the proxy owner can be located at his home or in her office, remotely attend one event in one location using telepresence, while her proxy fills in for her in a second event (in a third physical location). The proxy owner can switch between the two events back and forth, such that at each moment her attention is given to one event while the proxy replaces her in the other event. Such a scenario requires the operation of two different configurations of the proxy working in synchrony; technically, this involves running two clones of the proxy (which itself is a clone of the individual human) simultaneously.
Another advantage of the proxy over its human owner is that the proxy can have what we refer to as online sensors. Just like the proxy handles input streams, such as video and audio in ways that are analogous to human senses it can also be configured to receive input streams from the online world. Our proxy is integrated with Twitter, Skype, and with a smartphone: using an iPhone application we allow the proxy to receive a continuous stream of information about the owner's smartphone location, and the proxy also has access to its owner's calendar. Currently this allows the proxy to answer simple queries about its owner's whereabouts and schedule, but more sophisticated applications of these data streams can be imagined.

Ethical Considerations
We envision that in the not-too-far future our society may be swarming with various kinds of proxies, and specifically communication proxies. Thus, we provide here a preliminary ethical discussion. For a legal analysis of the proxy's implications on contractual law and some recommendations, see the BEAMING legal report (Purdy, 2011) (specifically, see pages 68-77 that refer to the proxy).
De'Angeli (De'Angeli, 2009) discusses the ethical implications of autonomous conversational agents, suggesting a critical rhetoric approach that tries to shift the focus from the engineering of the agents to their psychological and social implications, mostly based on findings that virtual conversations can at times encourage disinhibited and antisocial behavior. Whitby (Whitby, 2008) discusses the risks of abusing artificial agents, and in particular robots, for example, in that the abuse might desensitize the perpetrators. These discussions are relevant, but our concept of the proxy introduces additional issues, since we focus on the proxy as an extension of its owner, rather than as a separate entity.
One of the key issues is the ethics of deception. There is a general consensus in human societies that we need to know who we are really dealing with in social interactions. In general, we can expect people to notice whether they are interacting with the person or the proxy, especially with current technologies. But assume that the proxy takes over just for a few seconds to hide the fact that your colleague decided to respond to a mobile phone call. Is this socially acceptable if you are not aware that your colleague's proxy took over? In fact, this deception already takes place in online chat support systems, where support staffs switch between typing themselves and using automated chatbots back and forth. The fact that these services exist indicates that people can accept these unclear transitions, at least in some contexts.
Legal and ethical issues are hard to disentangle from technical issues. For example, a proxy owner may explicitly instruct their proxy to avoid taking responsibility for any assigned tasks. However, assume that during a session everyone is seated, and the boss says: "those who wish to be responsible for this task please remain seated." Since we do not expect the proxy to be able to understand such spoken (or even typed) text (at least not with high certainty), there is a fair chance that the proxy would remain seated. Does this now make the proxy's owner committed? This may be especially tricky if the proxy remained seated while in mixed mode.

Evaluation Scenarios
One evaluation study was conducted in the context of a real classroom, and the proxy replaced the lecturer during class 8 . This study was intended as a technical evaluation of the system as well as obtaining feedback on the concept of the proxy; the results are described in detail in a separate paper   (Figures 6,7). In this chapter we describe, in detail, another study, which aims to illustrate how the concept and architecture of our proxy can be used in an unconventional novel fashion.

The dual gender proxy scenario
Telepresence provides an opportunity to introduce a ''better than being there'' experience. For example, Bailenson et al. introduced the transformed social interactions (TSI) paradigm (Bailenson, Beall, Loomis, Blascovich, & Turk, 2004), which explains how the unique characteristics of collaborative virtual environments may be leveraged to provide communication scenarios that are better than face to face communication. In this section we show how the proxy concept can be naturally extended to include the TSI paradigm.
The study consisted of a business negotiation scenario: in each session two participants of mixed gender were given background information about a business decision, and were asked to reach an agreement within 20 minutes. The participants were told that they would hold the discussion using video conference and that there would be a mediator, represented by an avatar, whose role is to help them reach an agreement in the allotted time. The instructions were provided in a way that would not disclose the gender of the mediator.
In each session the two participants were placed in two separate meeting rooms, in front of a large projection screen, and were provided with a microphone for speaking. Both participants watched a split screen: most of the screen showed the mediator avatar, in our custom application, and a small portion of the screen, in the top right corner, showed a live video feed of the other participant in the other room, using Skype (Figures 8, 9). The mediator controlling the mediator avatar was a confederate, sitting in a third room. Using our proxy program the confederate received the live video and audio session from Skype, and could type responses in a text window. The text was immediately converted into speech and was delivered by the proxy avatar simultaneously to both participants. The confederate used a structured template of a discussion in order to keep the intervention as close as possible in all the experimental sessions. In all sessions the confederate avatar was controlled by the same female experimenter. Eighteen participants (ages 22-28) were recruited on campus and participated in the study for academic credit. The study included three conditions, with three different couples in each condition: i) MMboth participants experienced a male mediator, ii) FFboth participants experienced a female mediator, and iii) MFthe male participant experienced a male mediator and the female participant experienced a female mediator (Figure10). All couples were mixed: a male and a female.
The hypothesis was that the participants would perceive a mediator of the same gender as themselves to be "on their side." Therefore, a further hypothesis is that the mediator would play its role more effectively in the dual gender (MF) condition. At the end of the session each participant went through a semi-structured interview, with a fixed set of questions. The interview started with general questions about the negotiation session and about the experience with the technical setup. This was followed by two questions about the mediator and his/her perceived contribution. Only the last question explicitly asked the participants about the mediator's gender and how this played a part in the mediation process. The interviews were videorecorded and transcribed for analysis.

Results
Since the number of sessions in each of the three conditions was small (three sessions each) we could not perform a quantitative analysis. One out of three in the MM condition reached an agreement in the allotted time, one out of three in the FF condition, and two out of three in the MF mixed gender condition; this provides preliminary support for our research assumptions but could be anecdotal.
The subjective reports further support our assumptions. Even when the participants did not explicitly refer to the gender of the mediator, their comments reveal that the gender of the mediator may have been significant to the experience. In the MM condition the participants referred to the mediator's gender several times. Participant F1 (age 23) commented: "I would advise to think about a solution that would make the mediator more useful," whereas a male participant (M2, age 24) commented: "despite the slight delay he [the mediator] promoted us and encouraged us to reach an agreement." One of the female participants explicitly commented (F3, 26): "the mediator wasn't fair. Every sentence he uttered was biased towards the other side. In negotiation the mediator is supposed to be unbiased and not so aggressive as was the case here." The subjective reports further confirmed our hypothesis in the FF condition. For example, one of the male participants (M5, 22) commented: "the mediator was too gentle. Although she intervened frequently she was notable to narrow the wide gaps, and even damaged the negotiation. I think the mediator needs to be a man who will take control of the negotiation from start to finish." In the dual gender condition, further comments reinforce our expectations; participant F8 (25): "I think the mediator helped us move forward, no doubt the gentle and positive female character assisted the negotiation." Participant M9 (27): "in 2-3 more minutes we would have reached agreement; the mediator helped me and was on my side." Participant F9 (26): "The mediator was efficient. Even though we started with large gaps she helped us move forward and we were very close to agreement. I have no doubt that her presence accelerated the process."

Conclusions
We have introduced the concept of a communication proxy, which is able to replace a human in a telepresence session and seamlessly switch among several control modes of the human-machine control spectrum, and transform various aspects of the communicator online. We have described an implemented system, which allowed us to explore various configurations and modes of communication with a shared set of modules.
For many of us it would be convenient to use proxy replacements. However, would we want to live in a society where others are often represented by proxies? As mentioned in  there was a significantly higher acceptance of the concept of a proxy by male participants than female participants, but a vast majority of both genders would be happy to use a proxy. Based on these preliminary findings we conclude that we are likely to see various kinds of communication proxies gradually deployed. We encourage the developers of such autonomous intelligent agents to be aware of the social, ethical, and legal implications while pushing the technology further.