On Studying Human Teaching Behavior with Robots: a Review

Studying teaching behavior in controlled conditions is difficult. It seems intuitive that a human learner might have trouble reliably recreating response patterns over and over in interaction. A robot would be the perfect tool to study teaching behavior because its actions can be well controlled and described. However, due to the interactive nature of teaching, developing such a robot is not an easy task. As we will show in this review, respective studies require certain robot appearances and behaviors. These mainly should induce teaching behavior in humans, be interactive, match the study design, and be realizable in terms of effort. We discuss how remote controlling of the robot or simulating robot capabilities is used as an option. With this review, we introduce the field of research on studying human teaching behavior with robots as a tool in the experimental design. We will provide a structured overview of existing work, and identify main challenges of employing robots in such studies.


Introduction
In the field of robotics, the matter of teaching and learning is of increasing interest: On the one hand, there is the idea of a robot as the never tired, objective teacher which could one day teach pupils in an individualized one-on-one manner.Children seem to be fond of robots, and though results are mixed, research indicates that also children with an autism spectrum disorder prefer robots as interaction partners over humans (Tapus et al. 2012) making robots more suitable teachers for them.Currently, robotic teachers are being developed to teach children about diabetes (ALIZ-E project1 ), a second language (L2ToR project2 ), and of course how to program (the Root robot3 ).Intelligent tutoring systems are being developed that aim at optimizing the input for students (the Kidlearn project4 ).
On the other hand -and this is the direction we are focusing on here -robots need to be learners: robots cannot cope with every possible situation they could ever encounter as this entails a programmer to universally prepare them beforehand.Also, everyday users should be able to teach a robot their preferences and demands without having to program the robot, which presents additional motivation in the same sense.These users are foremost inexperienced in human-robot interaction (HRI) and do not have a technical background, such that a robot learner should be capable to participate in and understand natural human teaching interactions.And indeed, robots are viewed as not having much experience of the world, similar to young children who are very good learners.Teaching children comes naturally to humans and presents a possible way to transfer knowledge to robotic systems.To enable robotic systems to develop and learn new skills from users entails them to be equipped with appropriate learning algorithms capable of incrementally acquiring skills from few examples.
In the remainder of this article, we focus on the robot system as the learner learning from a human teacher in interaction.
In addition to developing suitable hardware and software, in the field of robotics, in parallel, human-robot teaching interactions are investigated in studies for two main reasons.Most studies or experiments involving teaching interactions evaluate a certain learning algorithm (e.g.Argall et al. 2009).This is important because if research on robot learning did not take the interaction with (inexperienced) users into account, we might develop systems that are in the end actually not usable at all.In order to test how a specific learning algorithm performs, the developer tailors the interaction according to the properties of the algorithm leaving only little to no freedom to the teaching user (Vollmer et al. 2016).Only few HRI studies set out to investigate human teaching behavior itself.In the cognitive and developmental sciences, natural teaching has not been a major focus and literature is mostly dedicated to learning, and not to teaching processes (Strauss and Ziv 2012).Theories on how we learn are used to create theories and guidelines on optimal teaching, but studies on how teaching is done naturally are relatively scarce.Therefore, one of the first steps to develop systems that can learn from users should be to understand how these users naturally teach.Then on the mid-term we should also understand how their teaching behavior can potentially be influenced by the robot learner for its own benefit in knowledge acquisition.
Robots being fully controllable in experiments presents a great advantage over interactions where the learner is human or even a child who cannot present the teacher with ever constant behavior.This makes it difficult if not impossible to steer or fix the experimental conditions.In HRI individual factors can be isolated such that the relevance of specific features, for example behaviors like confirming the target of a taught action via gaze, can be tested and described.The level of experimental control that the consistency of a robot learner allows is hard to achieve with a human learner.The behavior of a robot learner is not influenced by social signals or cognitive load.
Thus, viewed from the other end, these human-robot teaching interactions might present a powerful opportunity to study teaching under controlled conditions -using the robot as a tool in the experimental design.Additionally, in a first stage, these interactions are also interesting from the viewpoint of understanding in how far human teaching behavior extends to inanimate objects like robots.
In this work, we aim to introduce the field of research of studying human teaching behavior with robots.After giving a short background on HRI experiments in Section 2, we review current literature on the topic to provide a first overview of existing studies with respect to their design and the design of the robotic system used (Sections 3 and 4) and discuss their results in relation with relevant results from other fields (Section 5).Based on the analyzed literature, we identify requirements for these designs which should be met when conducting a respective study.We will conclude the literature review with presenting important challenges encountered in doing so (Section 6).

Main Approaches to HRI Experiments
HRI is an established field with hundreds of studies that have been published in the last years covering a wide variety of topics.Goodrich and Schultz (2007) reviewed a body of literature on HRI and identified some general practices.One method is to develop a real robotic system and evaluate it with the help of human participants.Another approach is to work with simulated robots to avoid technical challenges.Furthermore, robot autonomy can be simulated by remote controlling the robot which is called Wizard of Oz technique (WoZ).In a WoZ experiment, participants believe the robot to be autonomous, while it is actually fully or partially operated by an (unseen) experimenter.WoZ is applied without the knowledge of the participants and thus involves deception.Baxter et al. (2016) analyzed current characteristics of HRI research.In most studies the robots are not autonomous.Either they are partially or fully operated by a wizard to overcome technical challenges related to perception or the robot's cognitive abilities.The study length is typically rather short and conducted in a lab environment with participants drawn from university populations.A common method of assessing the importance of the results is Null Hypothesis Significance Testing (NHST) (Baxter et al. 2016).To study the effect of a specific robot property or behavior an experiment is designed with two or more conditions.For example, in one condition the behavior is present and in the other it is not.Based on the experimental result it is confirmed if the Null Hypothesis (i.e., that the difference in behavior has no impact on the variables measured) can be rejected.Measurements can be obtained by different means.A typical approach is to videotape the experiment and calculate metrics based on annotations of the interaction between human and robot, but also external sensors or internal sensors of the robot and data logs can be used.Another common method is to query the user's response after the experiment using questionnaires or the like.
Studying human teaching behavior with robots can be seen as a special case of HRI studies.Though learning from human teachers is a well-known problem domain in HRI, as for example in the extensive field of programming by demonstration or imitation learning (cf.Schaal 1999;Billard et al. 2008), there exist only few works with the focus on human teaching.

Methodology of Review
We searched for relevant studies matching the following criteria: 1.The study presents teaching interactions with a human as the teacher and a robot as the learner.2. The study does not focus only on the performance of the robotic system, but also explicitly evaluates teaching behavior.
Relevant literature is scarce.We found and selected 18 papers for our review, though we do not claim exhaustiveness.This is especially the case because to the authors' knowledge, there does not exist a specialized venue for this topic and it is difficult to identify HRI work with hidden secondary results on human teaching.Despite the commonality of the same general research question, there are many differences on various levels between individual works.Therefore, in a first step, we have summarized the studies presented in the papers and extracted a set of items from each publication which we are presenting in table form to provide an overview of existing work on studying human teaching with robots.The type of study provides a high level category, describing the interactional context in which human teaching was studied.This has implications on the scenarios certain findings can be applied to.The teaching and learning content strongly affects how humans interact with the system as it influences the type of demonstrations during teaching.Teaching behavior can be (partly) fixed or free to the user depending on the focus of the study.The robot used in the study also affects how humans interact with the system.For example its visual appearance causes human interaction partners to form expectations that might affect their initial behavior towards the robot.Figure 1 shows the robots used in the reviewed works.The robot behavior causes further expectations and provides cues which affect how human interaction partners will adapt.Furthermore, the robot behavior is often an important aspect of the study design, especially when it is varied between conditions.Sometimes, robot capabilities are simulated.The user tends to interpret and understand the behavior of a robot in the same way as the behavior of a human interaction partner.This is used to create an illusion of autonomy.Thus, the simulation capabilities are often emulated, meaning that only the resulting behavior is implemented but not the cognitive and perceptual processes behind it.These simulated capabilities we identified are related to the two categories of perceptual and cognitive WoZ that Baxter and colleagues identified when describing what they refer to as the "Level of Robot autonomy" (Baxter et al. 2016).The simulated capabilities and Knox et al. (2012) are not included here as they are not depicted in the respective publications are complementary in the sense that they not only include the capabilities simulated by a human wizard but address all robot behavior, as for example simulating action understanding or social capabilities with the design of clever robot gaze behavior.Still they also can be divided into simulated perceptual and simulated cognitive capabilities.Typically in each study, data is recorded and subsequently analyzed.
The type of data and the analysis methodology such as derived measurements are included in this review to better understand the perspective and the limitations of the corresponding results.
In order to further analyze the body of literature, we focus on a set of categories, which include aspects of the experimental design (study design) as well as aspects of the robotic systems used (robot design).For each category simple descriptive statistics were calculated.The aim of the analysis of the current literature is to understand the impact and added value of its findings and to derive a number of requirements which have to be met for the study of human teaching behavior with robots.

Review -Summaries of Individual Papers
In the following, we present the individual papers of our review along the items detailed above.They are ordered according to their main foci.Thomaz and Breazeal (2008) study how human users teach a simulated agent learner how to bake cake in a very restricted web-based setup (see Table 1 and Fig. 1a).The agent is a reinforcement learner that explores its 'world'.It has four different actions in its repertoire (Go, Pick up, Put down, and Use) that it can apply to objects at locations.The kitchen environment the agent is in, has five kitchen items and four locations.The successful completion of the task includes putting the flour and the eggs in the bowl, stirring them with the spoon, then pouring the resulting batter into the tray, and placing the tray in the oven.While the agent was exploring, the 18 participants that took part in the study could provide a reward signal to the learning algorithm.The reward was used to compute the agent's future actions.For giving a positive or negative reward, participants had to draw green (positive) or red boxes (negative) on the website by clicking and dragging the mouse.More specifically, they could give this reward either in general or for a specific object.Additionally, they were told that the size of the box they draw is communicated to the learner.But neither the size nor the object directedness mattered to the learning algorithm and were only designed to study the participants' behavior.It took the agent about 30 trials before the task to bake a cake was learnt correctly.The main results were that the participants used the feedback or reward channel mainly for positive feedback, but also differently than expected.At a specific moment in the sequence, they tried to direct the robot to the next relevant object instead of giving a reward for the current state.This lead the authors to modify the learning algorithm in four follow-up studies.In the first follow-up study the algorithm also accepted a guidance message for a next relevant object.The second follow-up study was concerned with using gaze to signal uncertainty about next actions.In the third study, the agent reverted to the previous state upon receiving a negative reward.The fourth study allowed participants to motivate the agent in addition to giving feedback.These follow-up studies were mainly evaluated with respect to learning which was improved with each of the above modifications.Thomaz and Cakmak (2009) (see Table 2) in their study investigated the examples of object affordances a human teacher provides to a robot learner (see Table 2 and Fig. 1b).The 14 participants of this study were sitting at a table facing an upper torso of a small humanoid robot that has a webcam as a head.The robot was placed on the table.The participants' task was to pick one object at a time from the set of colored objects (a cuboid, two spheres, a box, and a cube) and place it horizontally centered in front of the robot.The robot explored the affordance by each time carrying out one of two possible actions: poke and grasp.It started with one action (counterbalanced) and it is the participants' choice when trials with the next action start.These interactions with the objects could result in a total of seven different affordances: lifted, opened, other (for grasping) and rolled, moved, tipped (for poking), and no effect.The utilized learning algorithm operated offline, such that it was provided with the information on object and action and with the (after the study) hand-labeled affordances.The study compared the above social condition in which participants who chose which object is presented with which action how many times to a nonsocial one.In the non-social condition, examples were distributed systematically and exhaustively (756 object interactions without a teaching user).The authors made a number of observations.Participants presented examples with no effect less often and showed more rare affordances.They presented complex objects (i.e., objects with a high number of affordances) more often than simple ones, but present simple examples first.Mostly participants presented one object multiple times before switching to the next.They in general did not go back and forth or moved randomly through the set of objects.Another observation was that participants marked the goal of an action by 'interrupting' the robot during its action execution.Participants were instructed to wait until the robot had moved its arms back into their home position.However, for instance when an object was lifted, they already positioned their hands to catch it when it was dropped.Also when an action had no effect, some participants repositioned the object before the robot completed the action.When an object could not be centered in the visual field by the robot, it looked up.This error behavior seems to reduce recovery time (the participant repositioning the object).Khan et al. (2011) investigated human teaching behavior in view of models of computational teaching, i.e., curriculum learning principle and teaching dimension model (see Table 3).Thirty-one participants taught the one-dimensional concept of graspability to a robot in interaction.The robot used in this study is a human-like Mitsubishi Wakamaru robot (see Fig. 1c).The participants first sorted cards with black and white photos of common objects in a line according to the graspability of the objects and then gave the objects binary labels (graspable or not) resulting in a decision boundary.After that, participants left the room for the robot to inspect the cards, but not the labels.Then, participants began teaching in one of two conditions and showed the robot the cards one at a time using as few examples as possible.In the first condition, participants used natural language to teach the robot and participants in the second condition only uttered the label "graspable" or "not graspable".The robot behaved the same in both conditions, it was programmed to follow motion.The study conveyed three main teaching strategies: starting with extreme examples and moving toward the decision boundary which was employed by most participants, following a linear sequence from either end to the other, and solely providing positive examples.

Teaching Strategies
In the work by Kaochar et al. (2011), a WoZ study was conducted in which the learner controls an unmanned aerial vehicle and participants taught an 'electronic learner' missions by sending commands (see Table 4).Forty-four university students participated in the study.The interaction between participant teacher and robot learner (i.e., wizard) was mediated by a simulated environment in the Intelligence, Surveillance and Reconnaissance domain including a terrain map with cargo boats and fishing boats and the actions of retrieving object information or retrieving object radiation levels.The authors developed an interface with which the participants could choose between different teaching methods: demonstration, giving examples, reinforcement, and testing the learner.The participants' task was to teach the student how to distinguish the two types of boats, to use the radiation sensor only on cargo boats, and to report the readings.Data in form of interaction transcripts were analyzed and showed all three modes of instruction in more than half of the sessions.Also testing was done in-between teaching throughout the sessions instead of testing in one block at the end of the interaction.Kim et al. (2009) describe a study on human vocalization and affective expression in a human-robot teaching scenario (see Table 5).During the study, 27 participants teach Pleo dinosaur robots (see Fig. 1d) which buildings to demolish from a set with marked and unmarked buildings.There were three pairs of buildings in the course of each trial of which each one consists of one marked and one unmarked building.The robot was remote controlled.The authors hypothesized that users provide affective guidance, feedback and affective vocalization, and additionally that interaction history in form of learner performance affects participant's vocalizations and prosody.The results confirmed the hypotheses.Vocalizations were equally distributed among feedback, direction and guidance categories.Humans tend to say less to a continuously succeeding learner.In case a learner was struggling before, humans vocalize more.The authors conclude that humans take the history of the learner into account.

Questions
Cakmak and Thomaz (2012) evaluated which robot queries (identified in a prior human-human interaction study also reported in this work) are preferred by users after they have shown it different actions by moving its arms (see Table 6).The 18 participants that entered the study had to physically move the right arm of a fullsize humanoid robot torso (see Fig. 1e) standing in front of a small table in order to teach it three different actions: pouring cereal into a bowl, adding salt to the salad, and pouring soda into a cup.Each action was demonstrated twice by each participant for a total of six demonstrations.The participants gave speech commands to the robot, for instance marking the start and end of a demonstrated movement.Simon acknowledged the commands via speech and head nods and followed its own moving hand with its gaze during the demonstrations.The focus of the study did not lie on these demonstrations but on the questions the robot asked participants after each demonstration was finished.Upon asking the robot if it had any questions (as part of the experimental protocol), the robot asked one of three different query types: a label question (e.g."Can I pour cereal like this?"), a demo question (e.g."Can you show me how to add salt from here?"), or a feature question (e.g."Can I pour [soda] at different heights?").These types were found in a human-human interaction experiment (Cakmak and Thomaz (2012), Section 4) prior to the HRI study (Cakmak and Thomaz (2012), Section 5).Participants filled out a questionnaire on their preferences and perception after the study.Their answers revealed that feature queries were perceived smartest and demo queries least smart.Furthermore, feature queries seem to be harder to interpret than label queries.Rosenthal et al. (2009) also investigate participants' answers to a robot's questions (see Table 7).They present a method and guidelines for developing robot questions.In a study, they evaluated the correctness of user answers based on the content of robot questions.The study design involves a small humanoid robot (see Fig. 1m) that is watching 37 participants fulfilling a building task with blocks.During the task the robot asked questions but included different levels of contextual information, predictions, the type of the robot's uncertainty, and object features.Participants are split among a set of conditions reflecting combinations of these levels.Although the robot is operated by a wizard, it automatically follows colored objects.The correctness of Correct prediction from the robot about the block they are using (e.g., "It might be a cube.") vs. no prediction.
Robot says that it "Cannot determine the shape" vs. it does not mention uncertainty.
Robot asks human additional question about which features it used to classify the block (e.g., "Why is the block a cylinder?")vs. no additional question.

Type of data Classified user answers presumably from video data
Methodology / Measurements Correctness of user answers

Results
Context in questions helps the user's answer is annotated.The results revealed that the more contextual information is provided by the robot the less errors are made in answering the questions.Furthermore, the more suggestions are provided by the robot the more user answers are correct.Based on the results the authors suggest that if robots would provide information about uncertainty, global context, prediction, and features selected when asking questions, the human answer would contain the lowest amount of errors.

Manipulations of the Learner
Nagai and colleagues studied human teaching in variations of the same human-agent interaction setup.Human teachers demonstrated object manipulation actions like nesting of differently sized cups to a simulated agent, a video representation of a cartoon head, shown on a screen.The agent is called Akachan and is designed to have baby-like features (see Fig. 1g).The head does not move on the screen, but gaze direction is revealed through eye movement.The eye gaze is controlled via a bottomup saliency model that takes color, intensity, orientation, motion, and flicker into account.The robot attends to the most salient object in its view (an external camera is used).Muhl and Nagai (2007) tested participants' reactions to interruptions in the robot's attention (see Table 8).22 participants were sitting opposite a screen at a table and demonstrated actions to the robot shown on the screen.During action demonstrations, an experimenter introduced an artificial distraction into the attention system which was more salient than anything else in the robot's view.This variation caused the robot to fix its gaze to the upper right corner of the camera image, away from the relevant scene.The authors used ethnomethodological conversation analysis to qualitatively investigate the interactions captured on video.They found the most common

Results
The robot was accepted as a proactive communication agent.Strategies used by participants to regain the robot's attention and repair the interaction.
reactions to the variation were 1) following the line of robot's gaze, 2) attracting attention to oneself via speech, movement or noise, 3) attracting attention to the object by pointing or showing, 4) getting into the robot's attention by calling the robot, moving in its line of sight or approaching it, and 5) reflecting for oneself reducing activity or testing hypotheses.Vollmer et al. (2009a) compared teaching behavior towards children, adults and robots (see Table 9).In the HRI part of the study, the participants also demonstrated different object manipulation actions like the nesting of cups to the Akachan robot simulation (see Fig. 1g).The robot gazed according to the underlying attention mechanism (saliency model).Teaching behavior of 12 participants in the HRI study was compared to the teaching behavior of 8 parents presenting the same actions to their 8 to 11 months old children in the same setting and to the teaching behavior of 12 adults presenting the actions to another adult.The authors analyzed the movement trajectories of the parents' hands applying measures with which Motionese (i.e., movement modifications toward children, cf. Brand et al. (2002)) can be assessed (range of motion, velocity, pauses, roundness etc.).Additionally, they aimed at analyzing the contingency of the interaction by investigating participants' gaze behavior (frequency and length of gaze bouts to the robot, the object, elsewhere etc.).The analyses revealed that participants modified their movements toward the robot even more than towards children and presented it with less variability, however, they did not monitor the robot as they monitored their child and even less than they monitored another adult.10).The same dataset of HRI

Results
In HRI most accentuated input and most stable behavior.less contingent eye gazing behavior in HRI was used to compare against a follow-up HRI study with 14 participants for which an iCub robot (see Fig 1h) was equipped with the same saliency system as the Babyface simulation.The iCub robot is a full-size humanoid robot modeled following the proportions of a 3.5 year old child and also possesses the same type of infant-like facial features as Akachan.It either only moved its eyes or the whole head to the most salient location.Lohan et al. (2010) focused on the Contingency measures and found that the iCub robot was monitored more than the simulation and it was monitored most in the condition with the moving head.The authors ascribed the results to the iCub's higher degree of embodiment.Fischer et al. (2012) conducted a linguistic analysis of the data and found that features concerning interpersonal relationships (checks of understanding, imperatives, etc.) and their amount indicate more interactivity with the embodied robot.Additionally they found that the body of the robot indicates the robot's capabilities.Vollmer et al. (2014), Pitsch et al. (2013), andVollmer et al. (2013b) analyze a study where an inexperienced user sits across from a Honda full-size humanoid robot (see Fig. 1i) at a table and demonstrates actions with different objects (see Table 11).Actions were chosen to be either about the movement or about the end position.59 participants showed the robot for example how to clean a window with a sponge (movement) or how to hang up the phone (end position).The study was designed to investigate the impact of a robot's feedback on teaching.The feedback either signaled understanding or misunderstanding via two modalities.After one teacher action demonstration, the robot reproduced the action by either imitating (reproducing the movement completely, correct for "movement" actions and incorrect for "end position" actions) or emulating it (reproducing only the end position correctly, correct for "end position" actions, incorrect for "movement" actions).Also the robot's social eye gaze during the teacher demonstrations reflected the future reproduction condition: the robot's gaze either followed the movement of the object (in the imitation condition) or anticipated an end state shifting the gaze away from the object towards a fixed end point (in the emulation condition).Two other eye gaze behaviors in which the robot gazed randomly to relevant locations during action demonstrations or did not move the head at all served as a baseline.Eye gaze patterns in the random condition were modeled after the gaze of infants.Each participant demonstrated an equal number of movement and end position actions and saw both imitation and emulation for both kinds of objects, however for one action the replication was the same for all demonstrations meaning that the robot did not learn or alter its replication behavior according to the teachers corrections.Again movement and eye gaze measures were utilized to evaluate data.Also questionnaires and interviews with participants were used.The results show that human teaching strategies are transferred to teaching robots.Action replication and eye gaze influenced the teacher's behavior.
Exactly how seemed to depend on the teacher's monitoring behavior.
In another study in which 16 participants teach actions to a small humanoid robot (see Fig. 1l), Nagai et al. (2010) investigate how the robot's infant-or adult-like attention influences teaching behavior (see Table 12).The robot was sitting on the table and participants were facing the robot.They showed the nesting cups action to the robot who either gazed according to the before-mentioned saliency model (infantlike gaze condition) or was controlled by an experimenter (i.e., a wizard) who chose a gaze target in the robot's camera image.The targets were selected according to simple rules (following a cup which is moved towards the goal, anticipating the goal when a cup is held but not moved towards the goal, and gazing at the teacher, when no cup was held) reflecting the sophisticated gaze behavior of an adult (adult-like gaze condition).The authors found that participants modified their movements (Motionese) when the robot was exhibiting infant-like gaze.
In the study by de Greeff and Belpaeme (2015), participants played a language game with a robot who either signals its learning preference or not (see Table 13).
In the setting of this study, each of the 41 participants sat across from the Light-Head robot consisting of a semi-transparent mask mounted on a 6 DoF robotic arm, in which an animated character is projected (see Fig. 1j).The top of the table standing between the two interaction partners was a touch screen on which cards depicting different animals were shown.The teacher chose one of the cards and communicated one of seven possible labels shown on the screen for the animal on the card to the robot.The robot then guessed which of the pictures the teacher meant and received positive or negative feedback.Each participant played 50 of these guessing games.The robot's behavior was manipulated such that for half of the participants it showed a preference for a less familiar animal card through gaze and an utterance revealing its interest.The main results relate to the system's learning performance, the participants' choice of cards, their gaze behavior, and a post-experimental questionnaire.The study revealed that teaching behavior seems to be tailored to the robot's performance, that female participants responded more to the robot's preferences, and that the robot showing a learning preference improved the quality of examples.

Results
Gender effect with female participants being more responsive to the robot's signal of learning preference.Quality of learning input better when robot signals preference.Pitsch et al. (2012) and Fischer et al. (2013) analyzed participants' teaching when the robot shows saliency-based gazing or designed reaction patterns (see Table 14).The robot employed in this study is an iCub humanoid robot (see Fig. 1h).24 participants took part in the study and presented objects to the robot.The objects were cubes of different size with visual markers on each side so that the robot could detect them.For 12 participants, the robot gazed according to a saliency model and for the other 12, the robot detected participants' gaze directions and reacted with a prescripted behavior pattern mirroring the participants behavior: 1) when the participant gazed at the robot, the robot gazed back at the participant and smiled, 2) when the participant gazed at the object, the robot also looked to the object and smiled, 3) when the participant gazed elsewhere, the robot gazed to random locations with a neutral face, and 4) when the participant presented an object, the robot looked at the presented object and pointed to it.A linguistic analysis by Fischer et al. (2013) revealed that the contingent robot is presented with smaller chunks of information.An ethnomethodological conversation analysis by Pitsch et al. (2012) showed how the responsiveness of the robot at the beginning of an interaction shapes the overall teaching behavior of the participants.The first 20 seconds of an interaction lead to a stable tutoring style, either monologic presentations, if the robot was not responsive or dialogic presentations, if it was responsive at the beginning.
In the study conducted by Lohse et al. (2013), two participants interacted with a robot simultaneously and taught it object labels (see Table 15).The robot is a composition

Results
Contingency has impact on complexity and pre-structuring of task for robot.First 20 seconds of interaction shape users' perception of system's competences.Robot responsive at beginning: dialogic tutoring style.Robot non-responsive at beginning: monologic presentation of GuiaBot and PatrolBot by MobileRobots and does not have a humanoid appearance but a screen on which eyes can be displaye (see Fig. 1f).The study was a Wizard of Oz study in which always one of 40 pairs of participants stood at a round table with different objects across from the robot.The robot was manipulated to either perform well or perform badly such that it learned right or wrong object labels more or less frequently.It replied "This is the [label] then" with the label being either correct or incorrect when a participant told the object name to the robot.The options for utterances were selected by the wizard.Each pair of participants taught in both conditions with two separate sets of objects (with the same type of objects but different appearances).The eyes displayed on the robot's screen were remote controlled to gaze at the person that was speaking or in between participants and at the table, when no one was speaking.The video and audio data of the study were analyzed automatically using a tool for acoustic packaging associating motion and speech segments (Schillingmann et al. 2009).The study conveyed that teaching behavior when the robot performed badly was marked by longer utterances, more motion peaks, and a higher deviation from the mean prominence.Moreover, participant pairs aligned strongly with strongest alignment in the bad performance trials.Lütkebohle et al. (2009) present a video study in which participants answered questions about a video of a HRI in which the robot is taught object names and how to grasp those objects (see Table 16).The robot in the video is composed of two robot arms which are fixed to the ceiling of the laboratory, with a left and right  1k).In the background stands a humanoid robot torso that does not move.Arms and hands are designed for object manipulation and grasping.The interaction between user and system is mixed-initiative.The system actively asks for object names and grasps which structures the interaction.However, the user also can take initiative by issuing verbal commands.In the main video study, the authors showed 7 minutes of interaction of an experienced user (i.e., a developer) with the robot to 10 participants and investigated participants' reaction to dialog acts with varying initiative.The video was paused at relevant points in time and participants answered questions referring to the respective situation in the video ('What would you do now?','How would you instruct/correct...?' etc.).Analysis of the participants' responses revealed that in cases the system provided guidance by asking for the object label, users were certain and consistent in their response.In contrast, users took much more time in answering the system's question on how to grasp the object.The results show that clear task structuring helps to support inexperienced users in interacting with and teaching the system.Yu et al. (2010) present a study in which 25 participants teach object labels to a robot that either masters joint attention or behaves randomly (see Table 17).The authors aimed at taking the time course of the interaction into account.The robot in this study is a small humanoid robot torso on a stand (see Fig. 1n) on a table at which the participant was sitting.Only its head was moving during the study, the rest of the body remained fixed facing the participant.Its behavior was manipulated for two conditions.In a 'following' condition, the robot followed the participant's gaze to the object it detected to be in the participant's visual attention.In a 'random' condition the robot gazed according to pre-programmed random sequences of head movements with random pauses.The setting in which participants interacted with the robot was

Robot
Humanoid torso with 2 DoF, movable head (eyes, eyebrows and lips) and arms (Fig. 1n).In the present experiment, only the head was actuated.

Robot behavior Autonomous:
Following: robot follows the gaze of the participant to target object Random: pre-programmed random sequences of head movements followed by gaze pauses of random length

Results
Humans paid more attention to the robot in the random condition.Humans exhibit similar joint attention behavior in the 'following' condition to what happens in human-human interaction, but unnatural behavior with respect to eye fixations as well as their coupling with naming events in the random condition.
a laboratory setting designed for visual processing with all white furniture and backgrounds (i.e., white cloth covering the scene), participants also wore a white lab coat and a head-mounted eye tracker with camera.Two sets of three objects each with corresponding object labels were provided to the participants.They taught labels for one set, however they liked, for one minute and then switched to the other set, alternating between sets for a total of 4 demonstrations and minutes.Visual processing of the participants' first person video data extracted five regions of interest including the three objects, the participants hands and the robot.Analyzing visual data, gaze data, and speech, the study showed that the robot was monitored more in the random condition.With respect to eye fixations before, during, and after naming utterances, participants in the following condition showed similar behavior as observable in HHI.However, in the random condition their behavior was unnatural compared to HHI.Knox et al. (2012) conducted experiments with 57 participants evaluating how human trainer behavior is affected by taking the role of teaching or critiquing (see Table 18).They analyzed the impact of the agent's behavior on the teacher's feedback frequency.In a first experiment an agent-based reinforcement learning framework called TAMER is used as a platform for the experiment.The agent's task is to learn the Tetris game.The human participants are able to provide positive or negative feedback by pressing keys.No differences in the results between participants in the teaching compared to critiquing role where found regarding the agent's performance in the game.With respect to teaching behavior, humans provided more positive than negative feedback.In a second experiment, the agent tried to manipulate the teacher by behaving worse if the feedback frequency was reduced.The results indeed reveal a higher feedback frequency in this case compared to feedback toward a normally behaving agent.The changed frequency did not improve the agent's learning performance.

Results
No performance differences with participants in teaching vs. critique role.Participants show higher feedback frequency for the agents with feedback dependent behavior.

Type and Focus of Studies
A large part of the studies we analyzed manipulate the type of learner by including humans and different robots, by manipulating the feedback of the robot and most often its attention, or by manipulating how well the learner performs.For these studies, the teaching user's behavior is measured including speech, attention, and movements.Some works focus on how a robot learner asks questions and evaluate user answers and preference.Further works study how users provide examples or feedback, and when they are provided with different modes of instructions which ones they make use of.Most of the studies in our analysis have a condition-based design (11 of 18 paper, 61%) as opposed to being of exploratory nature (7 papers, 39%).

Learning and Teaching
The main teaching / learning contents we identified are essentially actions (10 of 18 papers, 56%) and labels (8 papers, 44%).Whereas actions comprise mostly movements and sequences of known elements, labels range from linguistic labels to object affordances, properties and concepts.All of the works study teaching interactions, but the robots only actually learn in five (28%) of the studies.For robots "not learning" means that the systems are not endowed with a learning mechanism or that at least it is not used in the scope of the study.This can have two possible reasons: the effort for implementing a suitable learning algorithm is too high or a learning mechanism would introduce unwanted variability into the study.We will detail the former in the next Section (Section 6).The users participating in the studies have the task of teaching the robot.Manipulative actions are physically demonstrated (in eight of the 18 papers, 44%), and objects are shown and named (in five papers, 28%).When the learner is an explorative learner, users give feedback (e.g. a binary reward for a reinforcement learner, in three papers (17%), or free feedback, in one paper (6%)).
In the analyzed studies, the robot seldomly performs the taught actions itself.Mainly, the robot just observes demonstration after demonstration without taking a performance turn.

Methods of Analysis Used
In the reviewed literature, we find more quantitative than qualitative methods of evaluation.The observed qualitative methods (in 27.78% of the studies) are techniques from ethnomethodological conversation analysis.Quantitative methods (in 88.89% of the studies) include the investigation of the distribution of examples or feedback users provide to the learner (order of examples, number of examples per object, positive vs. negative examples/rewards, feedback frequency (to certain objects), etc.), measures on the object/hand movement (speed, motion vs. pauses, motion peaks, etc.), measures on the participants' eye gaze behavior (frequency and length of gaze to relevant positions, etc.), and linguistic features (verbosity, attention-getting, structure, etc.).

Discussion of the Main Results
The main results found in the studies again can be structured into themes analogous to the foci of the studies.

Choice of Examples
The first theme relates to the examples or feedback that teachers provide to the robot learner.In the field of Psychology, Avrahami and colleagues have introduced the paradigm of "teaching-by-examples" with which they have conducted a series of experiments investigating category learning (Avrahami et al. 1997).They have found that the most used strategy to teach a category was to first give several positive examples which were ideal, then give an ideal negative example and after that move to examples around the decision boundary.In their experiment, participants could choose examples from 20 items.On average, participants showed M = 7.22 examples to the learner and 84% showed at least one negative example.The literature we review here supports these findings: In the experiment by Khan et al. (2011) which uses the same paradigm, participants chose from 31 items and showed on average M = 8 examples.90% of participants showed negative examples, very similar numbers to the ones found by Avrahami et al. (1997).In the HRI experiment, the strategy applied most often by participants is described as starting from extreme (i.e., ideal) examples and then move toward the decision boundary which the authors state to be evidence of the 'curriculum learning principle' (Bengio et al. 2009) and which reflects the same strategy found in human-human interaction.Such ideal examples can be viewed as more simple than examples closer to the decision boundary.Thus, the two studies show that simple examples are provided to the learner first, which is supported by the study by Thomaz and Cakmak (2009).This structuring of teaching content with respect to complexity has also been regarded in education as for example for mathematics education in secondary school (Zodik and Zaslavsky 2008) and has been found to facilitate learning in studies on adult second language learning (McCandliss et al. 2002).However, also simulations of artificial learners equipped with mechanisms of intrinsic motivation/curiosity have been found to first learn simple situations before advancing to more difficult situations (Oudeyer and Kaplan 2004).These learners autonomously explore their environment and choose what to learn themselves.Thus, the found human teaching strategy can be optimal for robot learners as well, maximizing their learning progress.Here, the reviewed HRI studies seem to just confirm previous findings from other fields.However, in the human-human interaction experiments conducted in Psychology, only small children would have a knowledge incomplete enough to warrant the learning of simple categories.In interaction experiments, they do not allow for the necessary control, such that in these human-human experiments, either abstract, artificial categories have been designed with the goal to be unfamiliar to the learner (Avrahami et al. 1997), or the teacher is told to imagine a fictive alien learner unfamiliar with real-world (human) categories (Avrahami and Kareev 1990).In contrast to studies with imagined learners, studies using artificial categories allow for testing the real human learner's learning progress.A robot learner on the other hand might inherently be unfamiliar with real-world categories and thus, in theory, makes for more natural interaction for the human teacher as subject.Here, learning performance can also be measured, even if this machine learning (ML) performance might not reflect human performance.Even though ML algorithms are modeled after human learning mechanisms there exist marked differences between ML and natural human learning.Human learning mechanisms are not clearly understood in their entirety, such that ML techniques incorporate certain features of human learning but of course are not as advanced and complex.Consequently, ML has been shown to not always function optimally with the input human teachers naturally provide (Cakmak and Thomaz 2010).
In principle, however, these HRI studies, can help to identify features of human teaching which could be particularly efficient, neutral, or even detrimental for learning (see Table 19).For instance, the finding by Cakmak and Thomaz (2012) that teachers perceive feature queries of the learner as smartest, might create a false idea about the learner's (abstraction) capabilities.However these capabilities are necessary for posing a feature query autonomously, thus feature queries might not only be perceived smartest, but they might actually be smartest.As another example, de Greeff and Belpaeme (2015) found that women attuned more to the preferences signaled by their robot which is beneficial for learning.This gives rise for some delicate questions: Are women better teachers?More exaggeratedly: Are men unfit for this whole professional field?A higher variability of examples and a lower agreement between participants for active learners hint at a more individually tailored teaching style.Such differences in teaching have also been found for robot simulations versus embodied robots that are either more or less active (Lohan et al. 2010;Fischer et al. 2012;Pitsch et al. 2012;Fischer et al. 2013).Teaching seems to be more recipient-oriented when robots are active.

Adaptation to the Learner
In natural HHI, teachers modify their behavior.When tutoring infants for example, people modify their speech in pitch and intensity, use simpler words etc. which has been termed 'Motherese' (Strauss and Ziv 2012) and also they move slower, make more movement pauses, movements have more range and are less round (Brand et al. 2002;Brand and Tapscott 2007;Rohlfing et al. 2006).These latter modifications in movement have been termed 'Motionese' (Brand et al. 2002).Such modifications in tutoring are preferred by infants and have been suggested to facilitate and even enable learning (Brand and Shallcross 2008;Pitsch et al. 2014;Kemler Nelson et al. 1989).So there might exist something like registers which we make use of according to the learner whom we teach.Herberg et al. (2008) found that teaching behavior differs depending on the audience, when the audience was represented as mere pictures of an adult, a toddler, and a computer.Pitsch et al. (2012) showed that the first few seconds of interaction are most important and seem to shape a register which persists for the remainder of the teaching session (Pitsch et al. 2012).However, it must   Bad learner behavior results in more feedback be noted, that current HRI studies are of short duration such that we do not know how fixed such a register really is.Also, for robots this register might be novel or assumptions might not meet reality and a register has to manifest first (in the first few seconds?).Further research on adult-child teaching/learning interactions found that both, Motionese behavior (Vollmer et al. 2009b) and learner feedback (Vollmer et al. 2010), are age dependent such that either there are many registers with fine-grained nuances or they incorporate some degree of flexibility.
In contrast to the idea of registers, from an interactive view, teaching is bidirectional (De Jaegher et al. 2010) and the behavior of the learner is very important for teaching and has been suggested to shape the teacher's conduct but also to influence the teacher's mental representations of the teaching content (i.e., what they teach) (Vollmer et al. 2014).In adult-child interaction, learner feedback has been found to be operating on different levels.More specifically, there is a continuous involvement (e.g.gaze) and feedback at specific points within the structure of the interaction (e.g. through pointing gestures at objects) (Vollmer et al. 2010).This learner feedback directly influences the teacher's behavior in 'interactional loops', for example: the teacher checks the child's attention, the child's gaze behavior directly influences the teacher's instantiation of a movement, which, in turn, influences and orients the learner's gaze online (Pitsch et al. 2014).In teaching interaction with young infants, monitoring and guiding the learner's attention or engagement is a very important precondition for successful communication.The micro-adaptations like the above that have been observed for specific points in the interaction and in moment-to-moment analyses (Pitsch et al. 2013;Pitsch et al. 2014) could be the basis for Motionese behavior modifications measured on a more coarse time-scale, such as over the course of a whole interaction (as in Vollmer et al. 2009a;Vollmer et al. 2013a;Vollmer et al. 2014;Nagai et al. 2010).
Concerning a robot learner's random gaze, even if some of its properties (like direction, frequency and duration) are modeled after infants' gaze behavior (Vollmer et al. 2014), it is mostly unrelated, meaning that its timing and direction are independent of the current interaction and do not refer at all to the teacher's behavior.We say that it is mostly unrelated because temporally contingent situations might and do occur by chance (Pitsch et al. 2013).The teacher will most likely interpret an unrelated gaze behavior as inattentive (even more so, when it was temporally contingent and reactive before by chance) and respond with an increase in monitoring of the learner in order to be able to repair the learner's attention.This is a possible explanation why a random robot is monitored more than one with joint attention capabilities (Yu et al. 2010).Also in HHI, young infants with their incomplete attention capabilities are monitored more than adults because their attention often needs to be regained and repaired (Vollmer et al. 2009a).The difference in monitoring behavior toward the Akachan simulation (less monitoring, Vollmer et al. 2009a) could have three possible explanations which are not mutually exclusive.a) As the authors suggest, the purely reactive saliency mechanism does not sufficiently allow the teacher to understand the state of knowledge of the learner (no learner feedback at specific points within the structure of interaction) which results in a decrease of "interactive contingency" or monitoring of the learner's attention.b) The employed artificial saliency mechanism has been designed to simulate the bottom-up attention of infants (Nagai et al. 2008).
However, there are either discrepancies between this saliency-based gaze and the gaze of infants, or the saliency mechanism rather models older infants' gaze behavior, as it rather follows a shown movement with a certain speed causing exaggerated movements and pauses that maintain learner gaze on the moved object.Compared to a random gaze mechanism, it does not cause situations with gaze to irrelevant locations and therefore makes extensive monitoring unnecessary.c) Vollmer et al. (2009a) used a robot simulation for their experiment.In a follow-up study, Lohan et al. (2010) studied the tutor's gaze behavior as a function of the robot's degree of embodiment and found more monitoring towards the embodied robot with moving head, which was equipped with the same saliency mechanism as the simulation.
Human learners often actively request the information they need from the teacher, but also provide the teacher with subtle social cues that make apparent their state of understanding which, in turn, constitutes the basis for tutoring modifications.Respective mechanisms in robots convey (i.e., make transparent) the current state of the technical system to the teacher.Otherwise the state of a robotic system is difficult if not impossible to infer for inexperienced users.These mechanisms have been termed "transparency" mechanisms by Thomaz and Breazeal (2006) and have only been considered recently.These transparency mechanisms stem from a technical system view and depart from existing ML algorithms.Since we can assume that the teacher does not naturally provide the optimal examples for most of these algorithms, respective studies aim at investigating natural human teaching with respect to its compatibility with current ML approaches.Consequently, either ML algorithms have to be adapted according to study outcomes -which is rather difficult -or the adaptability of human behavior could be exploited using transparency mechanisms.Transparency mechanisms aim at enabling the human teacher to understand the current state of the robotic system and for instance signal uncertainty or attention with gaze (Thomaz and Breazeal 2008;Vollmer et al. 2014), and include robot state information when asking questions (Rosenthal et al. 2009).More straightforward versions of this mechanism include the explicit signaling of preference for a certain learning input ( de Greeff and Belpaeme 2015;Lütkebohle et al. 2009), explicitly asking the teacher questions, or requesting actions or feedback (Cakmak and Thomaz 2012).In these cases, the learner selects what information to request using algorithms that maximize information gain and progress (cf.'intrinsic motivation' (Oudeyer et al. 2007) and 'active learning' (Lopes et al. 2009)).These approaches are mostly evaluated with respect to learning performance as the general aim is to enable the human teacher to understand the current state of the robotic system to which the teacher, in turn, attunes their teaching.Transparency approaches should and do result in better learning performance assuming the teacher does not naturally provide the optimal examples for the learning algorithm.Some studies focus on the teacher's perception of the learner and found that learners with certain transparency mechanisms are perceived smarter (feature queries (Cakmak and Thomaz 2012), questions containing robot state information (Rosenthal et al. 2009)).If the learner's behavior is not transparent, the tutor tests the learner's knowledge throughout a teaching session if given the opportunity to do so Kaochar et al. (2011).
Another line of research does not foremost focus on the technical system, but instead is more concerned with the insides of interaction and rather concentrates on the robot's behavior or feedback during teaching interactions and what impact it has on human teaching.The corresponding literature draws inspiration from the above modifications in human-human teaching and learning interaction and focuses on specific individual features that are singled out in HRI studies.So far there exists only a small number of such studies.They already provide first hints for further research in this direction which has the potential to identify those learner behaviors and features that have a direct impact on human teaching.For example in the reviewed literature, an overall result pertains to the importance of the learner's attention and feedback in teaching.Distractions of the robot's attention cause repair activities (Muhl and Nagai 2007).A randomly gazing robot receives more attention from the teacher, but the teacher's behavior is unnatural with respect to eye fixations and their coupling with naming of objects, whereas for a learner that displays joint attention by following the teacher's gaze, eye fixations and naming are similar as in HHI (Yu et al. 2010).A learner that gives responsive feedback at the beginning of a teaching interaction receives a dialogic presentation, whereas a monologic teaching style is adopted when the robot is not responsive at the beginning (Pitsch et al. 2012).A robot with causal feedback of gaze, facial expression, and pointing is presented with smaller chunks of information as when the robot is merely showing attention through a saliency mechanism (Fischer et al. 2013).Further, feedback revealing understanding influences the number of demonstrations such that when feedback hints at an incorrect understanding, teachers repeat their demonstrations more often, but also they modify their movements and for instance demonstrate slower (Vollmer et al. 2014).When the learner is monitored, shifting visual foci are countered with adjusted motion pauses, speed and height in the demonstration (Pitsch et al. 2013).When the robot's learning performance is manipulated, slowly learning and badly performing robots are presented with longer utterances, more motion peaks, and more vocalizations (Lohse et al. 2013;Kim et al. 2009).When taught by multiple teachers, alignment between teachers is higher for slow learners (Lohse et al. 2013).When the performance depends on the teacher's feedback, feedback frequency increases in order to improve learning (Knox et al. 2012).
Further literature in this field thus can provide a map on which modifications influence teaching and what about the teaching behavior is impacted.

Summary
From the discussion the following main arguments can be derived: -The inherent lack of world knowledge in robots can be an advantage since it allows studying teaching even of basic knowledge in a natural setting.-Machine learning and human learning mechanisms differ, which has to be carefully considered when interpreting experimental results.-Machine learning can help assessing the quality of features in human teaching.
-Robots can help understanding teacher expectations.
-We identified two types of perspectives: a) evaluating the learning method b) interactive.

Technical Challenges
In this section we will describe a number of important technical challenges faced when studying human teaching behavior with robots.These are relevant because they show what is feasible.The main challenges are linked to robot design, costs and resources, learning mechanisms, generalizabilty, and replicability.

Robot Design
As we have discussed, the teacher adapts to different categories of learners, but to some degree also online, such that adaptations to the learning recipient are guided by continuously updated expectations about his/her knowledge and capabilities (cf.recipient design, Sacks et al. 1974).The feedback of the learner shapes the teacher's behavior and the exact instantiation of movements.
Expectations toward robots might also be strongly influenced by common misconceptions and assumptions about the capabilities of robots that might stem from what is suggested by the media.
For the purpose of investigating teaching in HRI, the above dependencies render the properties of the robot as an interaction partner ever more important.

Appearance
A robot is a mechanical or virtual artificial agent.Robots with a human-like appearance are called humanoids such as the ASIMO developed by Honda or the iCub developed by researchers at the Italian Institute of Technology (IIT).However robots do not necessarily resemble humans but range from industrial robots (e.g.assembling cars in factories) to drones.There exist commercially available platforms (ASIMO, Pepper, NAO, Baxter, Pleo, Paro, etc.) or various platforms developed at research institutions.
Regarding the appearance of a robot, it has been found that humans predict a robot's capabilities and potential applications based on its visual appearance (Hegel et al. 2009).For a robot interaction partner, teaching seems to be influenced by existing or missing parts.A robot without a mouth for instance is not expected to be able to talk and, similarly, a robot head without body and arms with hands is not expected to grasp manipulative action demonstrations (Fischer et al. 2012).
Humans tend to anthropomorphize robotic agents based on their appearance.The appearance of a robot shapes how we judge its capabilities and how we behave toward it.Usually robots are designed functionally meaning that their function defines their design features.A dominant assumption in the research field we discuss here is that a humanoid robot elicits behavior in its human interaction partner which is most similar to HHI.However, it might be necessary to avoid phenomena like the 'uncanny valley' (Mori et al. 2012).The uncanny valley describes a hypothesis first asserted by Mori and MacDorman (1970) which is applicable to robots.According to the uncanny valley hypothesis, responses to robots are more and more positive the closer they are to humans in appearance.However, as their traits become closer to human traits, but are not yet similar enough, this graph drops into a valley where robots are experienced as creepy.This effect is amplified by movement.For example, the androids, such as the Geminoid robots, developed by Hiroshi Ishiguro are among the robots existing to date that appear closest to humans (Nishio et al. 2007).Each Geminoid closely resembles a human model to the extent that they are not distinguishable at first sight (see Fig. 2).
Becker-Asano et al. ( 2010) investigated the reactions of visitors of a media art event to a Geminoid robot and found that even though reactions to the robot were mostly positive, the visitor's dominant emotion when interacting with it was fear which can be ascribed to the "android's outer appearance -especially its face -and imperfections of its movements".Half of the works we review employ humanoid robots in their studies (50%).The rest uses only human-like body parts, simulated environments, or animal robots.

Embodiment
In some studies, artificial agents are used (27.78%,e.g., Muhl and Nagai 2007;Thomaz and Breazeal 2008) instead of embodied robots (72.22%).These artificial agents are presented on displays such as computer screens and therefore only have a two-dimensional appearance.Movements in the third dimension or eye gaze directions might thus be more difficult to recognize by their interaction partners.Using them has the advantage of not having to deal with hardware that introduces noise and inaccuracy of movement.There also exist hybrid designs as for instance the SociBot by Engineered Arts, where simulations or animated figures are projected onto a partly inactive body (used by de Greeff and Belpaeme 2015).

Robot Behavior
Apart from the outer appearance of a robot, also its movements influence its acceptability.The movements of a robot depend on its hardware and software.For instance the hands of some robots are not suitable for dexterous manipulation or lifting of weight.These robots thus cannot be used to carry out actions themselves.Additionally, in some robots, software and hardware design issues can cause jittering and jerkiness.Robots which carry out manipulative actions in the studies we described are ASIMO (commercial) in Vollmer et al. (2014), Simon (commercial) in Cakmak and Thomaz (2012), and the bimanual setup (commercial) in Lütkebohle et al. (2009).Additionally, robots are often equipped with a gaze behavior in order to make them seem alive or simulate attention (e.g., follow motion, Khan et al. (2011); saliency, Nagai et al. (2008)).

Perception and Behavior Generation
Some studies, such as imitation learning scenarios, require the robot to learn from action demonstrations.Demonstrations can either be provided by performing the action oneself, such that the robot observes the demonstration.Observing the demonstration can be done with visual information processing for which subjects and objects need to be segmented and tracked online in the camera video stream.A slightly easier variant is to attach a visual or other (externally) tracked marker to the teacher or the object, and simplify tracking or directly providing only the trajectories as information to the system.Still the observation of a demonstration entails answering several questions.Given that the robot knows when a teacher is present (who to imitate) and when it is its turn to imitate (when to imitate), it still needs to find out what to imitate (Which sensors are relevant?When did the demonstration start and end?What is important about the action?The goal state?The means?) and how to imitate the action (Correspondence Problem).The Correspondence Problem refers to the fact that the robot usually has a different morphology than the teacher, due to different embodiments, such that movements of the human body cannot directly be translated.In order to avoid this problem, another method of demonstration is provided with kinesthetic teaching.Thereby, the robot is put in a mode in which the limbs are controlled to behave passively without the presence of an external nongravitational force.Then the teacher demonstrates the action by moving the robot's limbs around and the system records the joint angles over time.Kinesthetic teaching again becomes problematic for systems with high degrees of freedom (DOFs) as then, it is difficult to create coordinated, natural movements by moving the robot's limbs.
Speech processing in robotics is widely unsolved for unstructured interactions.Contrary, command recognition of expert users wearing a headset can be realized with relatively little effort using recent speech recognition frameworks.Changing the recording setup by placing the microphone inside the robot already creates first challenges.The mechanical noise of the robot and other environmental noise is also recorded.Speech recognition might induce further variability to the setup depending on the clarity of speech and the accent of the human interacting with the system.Subsequently, spoken action descriptions would have to be parsed by the system.This requires further speech processing and dialog handling.For example if the user's utterances cannot be interpreted, the system might need to ask for repetition or clarification.We observed that most studies avoid these problems either by strictly limiting the vocabulary, relying on Wizard of Oz techniques, or by using different means of input, such as buttons or touch screens.Speech production is typically realized using a speech synthesizer and pre-scripted utterances.In Wizard of Oz scenarios, the wizard is typically manually triggering the synthesis.

Interaction Capabilities
An interaction with a robot partner tends to become dull quickly.The stable behavior of the robot is often too predictable and inflexible, and it is missing components that keep the tutor motivated and the interaction ongoing.The studies we reviewed found that a robot has to provide the appropriate feedback in interaction (Vollmer et al. 2009a) and needs to be responsive because expectations are shaped during interaction with the first few seconds being most important (Pitsch et al. 2012).Furthermore, other interaction capabilities a robot should master include turn-taking, recognizing its interaction partner, understanding and yielding the user's initiative but also taking initiative itself (mixed initiative, Lütkebohle et al. 2009), joint attention, and it should be possible to interrupt the robot.As interaction capabilities present a great challenge for robotics, the interactions with users are mostly preprogrammed and rigid, leaving little to no freedom to the teacher (Vollmer et al. 2016).

Autonomy
The level of autonomy of a robotic system refers to the degree to which a robot acts itself on sensing and recognizing its environment.Autonomy is opposite of remote controlling the system as is the case in WoZ scenarios.In our analysis, we noted a statistical difference to HRI studies in general (as reported by Baxter et al. 2016) with respect to the level of autonomy: The system autonomy tends to be higher in the studies with the focus of investigating the human teaching behavior (66.67% of completely autonomous systems and 22.22% used partly autonomous systems as opposed to 48% of fully autonomous systems for HRI studies in general).When human teaching is studied it seems that it is more frequently studied after a system with some aspects of autonomy has been developed.This might be due to the following reasons.The influence of human behavior on the study is minimized and often to remote control complex behaviors is not as easy as developing autonomous actions.

Cost and Resources
In general for both, WoZ scenarios and fully autonomous systems, efforts need to be made to realize the system.For WoZ scenarios the interface from the wizard to the robot needs to be implemented.Usually, such an interface allows the wizard to realize certain interaction primitives for which he/she would otherwise be too slow using the robot's default software.Depending on the complexity of the interaction, engineering adequate wizard interfaces can be challenging.Also, for time critical feedback, WoZ interfaces might still not be responsive enough to achieve natural interaction and in WoZ scenarios human errors or delays might create artifacts in the experimental results.
In the literature review, we observed that due to these challenges the complexity of robot behavior is kept low to keep the implementation effort and cognitive load for the wizard low.In many studies robot feedback was simulated (see definition in Section 3) because of the difficulties and implementation effort of automatically detecting the necessary cues triggering the feedback.
Another problem is the cost of advanced robotic platforms, especially if they need to be capable of actual physical interaction with the human or manipulation of objects.Furthermore, non-commercial platforms might be difficult to acquire.
Based on the reviewed literature, it seems common practice to reuse existing setups for multiple studies.The reason might be the above points of platform cost and availability, and implementation effort.

Availability and Properties of Learning Mechanisms
It is difficult to study teaching behavior with fully autonomously learning systems because commonly (unexperienced) human users give only few demonstrations.Thus, in typical HRI teaching interactions, there is only a limited amount of training data.Therefore, either the developers define a certain frame and reward function that defines the task or it is very hard to acquire enough training data for more advanced methods such as deep learning (Mouret 2016).Appropriate learning methods for flexible, incremental, real-time learning from few laymen demonstrations have yet to be developed.Hybrid methods operating with pre-trained models which are flexible in adaption on small amounts of data could be one way.Another way would be more experiments with long term interaction and learning in HRI.However, this creates new technical challenges e.g. for the durability of systems and migration of learned models between potentially changing software versions.Regarding this issue of the limited amount of training examples, meta-learning mechanisms such as adaptive exploration (Moulin-Frier and Oudeyer 2013) or human-like cognitive control abilities may be crucial (Khamassi et al. 2011).

Generalizability of Results
One obstacle for the generalizability of results is the robotic platform that is used for studying teaching behavior.Typically in the experiments behavioral aspects of the robotic system are varied among conditions, but the utilized platform is secondary and receives no further attention than being mentioned.Only in few studies, the platform itself is varied.It is known that humans make strong assumptions based on the appearance of a robot.Evaluating the impact of different platforms in an experiment requires financial resources and additional implementation effort.Different robotic systems sometimes come with different tool-chains and basic behaviors.These might affect experimental outcomes as well.However, it is often assumed that results are comparable between groups such as humanoid robots.Baxter et al. (2016) report the common use of Null Hypothesis Significance Testing in HRI studies which, although the method is statistically valid, might be an issue for the interpretation of results.Thus, care must be taken when inferring teaching behavior patterns from HRI experiments.
Even further, the question whether results obtained in HRI scenarios can be generalized to human teaching behavior in general is not trivial.A growing body of research on HRI provides evidence supporting that indeed teaching a robot is like teaching a human (with respect to phenomena such as perspective taking, alignment, emotional responses, etc., for an overview see Krämer et al. 2012) and that thus, results are generalizable.Supporting this line of argument, also the reviewed research on HRI suggests that humans extend their teaching strategies to robots (Nagai et al. 2010;Muhl and Nagai 2007;Pitsch et al. 2013;Kim et al. 2009;Lohse et al. 2013;Vollmer et al. 2009a).
We think that human-like appearance and behavior in robots elicits teaching close to teaching in HHI.However, as the interactivity of current robotic systems is restricted and so far interaction is lacking flexibility, this topic needs to be explored further and should receive more attention in future research.

Replicability
First of all, replicating a study with robots requires the robotic platform to be available at different locations.For commercial platforms this might only require financial resources, however robotics is a fast-paced field, such that robotic platforms are changing or their production is discontinued.It cannot be assumed that custom platforms are available for every institute wanting to reproduce a certain study.
Similarly, robotics software also undergoes continuous development.For a certain period of time versions are typically compatible, but as the environment and possibly the platform are updated, older versions of the software cannot be used anymore.The robot behavior is usually described in the publication, but this description is seldom complete enough to reproduce the behavior.Detailed data to capture all details as for example the robots motions beyond the relevance for the present study is typically not available.

Additional General Challenges
Additional challenges pertain to the current state of robotics hardware.Studies involving physical interaction with robotics hardware can be difficult to realize as, although compliant robotics hardware is available, the probability of failure is still relatively high.Many robotic systems do not have the ability to handle collisions.For these systems even incorrectly estimating a table height can cause breakage.Physical interaction with non-expert users can also cause breakage if the system is not sufficiently robust.Some robots generally require frequent maintenance, temporarily interrupting the experiment.We observe that often smaller platforms are used which are more cost effective and easier to maintain.However, they typically cannot manipulate heavier objects.In contrast, interacting with larger robots might require safety measures to prevent humans and/or the most often costly robot platform from injury and damage.Also bugs in the robot system software might affect the outcome of a study.Although the behavior is described as accurately as possible, programming errors might affect reproducibility and outcomes of the study.

Requirements for Studying Teaching Behavior
From the analysis of the studies and their results presented in Section 5, we can derive a set of requirements for the design of robot systems and the design of studies for investigating human teaching in HRI.

Inducing teaching
In a study of the kind we review here, the robot's appearance and behavior should induce teaching.The idea of inducing teaching behavior is to use the appearance and the behavior of the robot to trigger human assumptions or reflexes.Ideally the resulting human behavior is similar as in typical teaching situations, so the robot system can benefit from the teacher as information source.However, in case assumptions across participants are non-consistent, their behavior toward the robot might be highly variable.In some experiments this could be a desired effect while in others this variance can overlay effects relevant for the actual hypothesis.

Relation to the study design
The aim of the robot design is a robot that is ideally suited to the achievement of the goal of the experiment, meaning that -The robot is capable of performing all necessary actions the experimental design requires (legs for standing at a table, hands for lifting/manipulating objects, speech synthesis software).-It enables robot behavior targeted by potential independent variables of the experiment (enabling the manipulation of certain robot behavior like different gaze conditions or different learning speeds).-It provides a suitable subject for human adaptation processes such that it triggers the behavior adaptations targeted by potential dependent variables of the experiment (triggering motionese through child-like appearance or supporting multiple iterations of teaching if required).-It minimizes undesired adaptations in teaching such as the generation of unrelated artifacts (a robot frequently gazing out the window causes repair activity for attention).

Interactive response (learner feedback)
Humans always try to find and interpret feedback signals.An important factor is that some response of the learner can be linked within the ongoing interaction.
The absence of such a response creates a disturbance in the interaction loop between teacher and learner.Except in cases where this might be part of the study itself, we recommend an attention mechanism as a baseline behavior and feedback revealing to the teacher what the learner has understood.This has been proven as an effective way facilitating basic HRI.

Implementation effort and system autonomy
The implementation effort for the study should be minimized and autonomous robot behavior not linked to the experimental conditions maximized.The implementation effort can be mitigated with a WoZ setup for simple behaviors especially in the sensitive conditions.

Fig. 1
Fig. 1 Robotic platforms used in the reviewed publications.The learners used in Kaochar et al. (2011) and Knox et al. (2012) are not included here as they are not depicted in the respective publications Lohan et al. (2010) andFischer et al. (2012) further analyzed teaching behavior with the same set of actions towards an embodied robot and compared it to teaching towards the Akachan simulation (see Fig1g, Table Avrahami et al. (1997) also found differences in examples provided by teachers for passive or active learners.For active learners different first examples were chosen.

Table 1
Thomaz and Breazeal (2008) Reinforcement Learning agent: (a) reward channel not only used for feedback, but also for future-directed guidance (b) positive bias in feedback (c) change of behavior while developing mental model of learner.

Table 6
Cakmak and Thomaz (2012) Theoretically: different Tasks, e.g.sandwich, pour, salt Fix: speech commands, tasks, modality of teaching (kinesthetically) Free: actual movements, answers to robot queries Robot Simon, 7 DOF full-size humanoid robot (a Mekabot, a torso on a stand, no legs) (Fig. 1e) Robot behavior Autonomous: gaze to hand during demonstration speech and head nods triggered by user commands asking prescripted questions accompanied by preprogrammed movements based on previous human-human study.Question types: Label, Demo, Feature queries.Teleoperated: Advance interaction sequence after query answer.Results HHI: Feature queries are most common, physical actions highlight/instantiate features for queries, types of queries, types of gestures/instantiations HRI: feature queries are perceived smartest

Table 7
Rosenthal et al. (2009) of Study Human-robot collaborative task experiment, answering robot questions, WoZ, no learning involved Teaching / learning content Recognition of block shapes Fix: construction Free: choice to answer, answers Robot Robosapien v2, small humanoid robot (Fig. 1m) Robot behavior Autonomous: follow faces and red, green, and blue colored objects (LEDs rotate towards target) WoZ: asking prerecorded questions Simulated capabilities Perceptual: segmenting objects, extracting object properties (shapes) Cognitive: when to ask questions about which blocks, speech production, object understanding Conditions (3×2×2×2) combinations of state information Robot questions in three conditions: no contextual information, local context of the colors of nearby blocks, and global information (color and position of block in structure)

Table 9
Vollmer et al. (2009a) Contingency measures (e.g., frequency and length of eye-gaze bouts to interaction partner)

Table 11
Pitsch et al. (2013), andVollmer et al. (2013b)Vollmer et al. (2013b) ResultsHuman teachers apply their teaching strategies to robots.The teacher's action demonstration is shaped by the teacher's action knowledge.But also by the learner's feedback -in the form of action replication and eye gaze, indicating what has been understood -influences repetition of action demonstration and modification of the teacher's movements.The robot can also provoke disturbances in the teacher's performance.The teacher's exact adaptation depends on his monitoring of the robot's behavior.

Table 16
Lütkebohle et al. (2009) Open Questionnaire, questions about how participants would respond in situation in the video / Distribution of concepts / answers used by participants, Distribution of initiativeResultsDialog structure with system initiative helps to guide users.

Table 19
Identified features of human teaching which could be particularly efficient, neutral, or detrimental for learning Literature