Dynamic testing: Can a robot as tutor be of help in assessing children's potential for learning?

This study examined whether computerized dynamic testing by utilizing a robot would lead to different patterns in children's (aged 6 – 9 years) potential for learning and strategy use when solving series ‐ completion tasks. The robot, in a “ Wizard of Oz ” setting, provided instructions and prompts during dynamic testing. It was found that a dynamic training resulted in greater accuracy and more correctly placed pieces at the post ‐ test than repeated testing only. Moreover, children who were dynamically trained appeared to use more heuristic strategies at the post ‐ test than their peers who were not trained. In general, observations showed that children were excited to work with the robot. All in all, the study revealed that computerized dynamic testing by means of a robot has much potential in tapping into children's potential for learning and strategy use. The implications of using a robot in educational assessment were stressed further in the discussion.

from a scaffolded feedback procedure or intervention are more likely to provide a good indication of a person's level of cognitive functioning than conventional, static test scores. The primary aims of research in dynamic testing have been to examine progression in cognitive abilities following training between test session(s), to consider behaviour related to the individual's potential for learning, and to gain insight into learning processes at the moment they occur (Elliott, Grigorenko, & Resing, 2010;Resing, Touw, Veerbeek, & Elliott, 2017). Dynamic test procedures differ from static ones, because in a dynamic test situation testees are given (guided) instruction enabling them to show individual differences in progress when solving equivalent tasks.
The aim of the current study was to investigate whether a computerized one-on-one dynamic test administered by a tutor robot could allow for (investigating) systematic and controlled dynamic testing outcomes. In doing so, we sought to examine the effects of receiving instruction and training by a robot on children's changes in performance across test sessions.
A major difficulty in undertaking highly interactive forms of assessment is that the assessor must try to fully engage with the child while also recording in detail each step in the process. A key advantage of computerized testing is that it may be possible to register every task-solving step taken by the child, which would provide examiners with the opportunity to analyse the sequence of these steps. This would offer valuable information about the child's learning progression during the dynamic process (e.g., . Computerized assisted instruction provided by a personalized robot may also offer promising new possibilities for dynamic testing. These include using more flexible approaches to task-solving, using more adaptive scaffolding procedures, and, consequently, creating a more authentic assessment environment (Huang, Wu, Chu, & Hwang, 2008;Khandelwal, 2006). Therefore, the one-on-one tutor robot in the present study, which had an attractive appearance to children, was designed to detect the children's task-solving steps, provide hints to solve the tasks, record in detail children's responses to that assistance and react adaptively to children's solving behaviour.

| Dynamic testing of inductive reasoning
Conventional, static tests are often used by educational and school psychologists and are viewed as a satisfactory means of measuring previous learning. Dynamic test measures, on the other hand, often employing a test-training-test format, are designed to assess developing or yet-todevelop abilities (Elliott et al., 2010;Sternberg & Grigorenko, 2002).
The theoretical framework for dynamic testing can be linked to the ideas of Vygotsky (1978), who posited that children's learning can be characterized as a social process, occurring in their zone of proximal development. This zone of proximal development has been defined in terms of the difference between children's independent task-solving, the actual level op development, and their level of task-solving after help or instruction has been given, often in the form of scaffolds, the potential level of development. The current study made use of robot-administrated structured pre-test and post-test instructions, and a graduated prompts training procedure in between, consisting of two separate sessions in which children were provided with hints to help them solve the tasks.
These prompts (or hints) included increasingly more specific and explicit feedback on how to solve the task presented. The hierarchical step-bystep provision of these prompts was given in accordance with the child's perceived needs based on their given solution of the task. In the current study, we programmed this hierarchical step-by-step procedure, in order to, potentially, examine the effectiveness of dynamic testing provided by a robot.
Our study focused on children's performance on series completion tasks, a subtype of inductive reasoning which has been shown to be a sensitive indicator of children's problem-solving ability (e.g., Holzman, Pellegrino, & Glaser, 1983;Molnár et al., 2013). We used a schematicseries completion task utilizing puppets: children were shown a series of puppets, with different arms, legs, bellies and heads; had to discover what the next puppet in the row had to look like; and had to construct the right puppet using tangible puzzle pieces (e.g., . With regard to solving inductive reasoning tasks, a distinction has been made between analytical and heuristic strategies (e.g., Klauer & Phye, 2008). An analytical strategy requires investment of time in planning the solution, whereas a heuristic strategy utilizes more time to test and retest (partial) hypotheses about the solution process.
For this reason, an analytical strategy is shown to require more time during the first, planning phase of solving tasks, whereas using a heuristic strategy requires more time for testing hypotheses about the solution. In the current study, we examined children's analytical and heuristic strategy use in solving series completion tasks.
The series completion task in the current study employed three-dimensional tangible puzzle pieces, which allowed the children to manipulate all the pieces freely, and enabled observation of their ways of solving the tasks.

| Computerized dynamic testing
To the authors' knowledge, no research has been conducted in which a robot was used to administer a dynamic test. Resing, Steijn, Xenidou-Dervou, Stevenson, and Elliott (2011), however, investigated whether computerized dynamic testing, using a multiple assessment, test-training-test format with graduated prompts given by a computer, provided more information about test performance than when these prompts were given by an examiner. Key to this graduated prompts approach is the possibility to incorporate feedback and tailored assistance into the training phases (Elliott et al., 2010;Grigorenko, 2009;Jeltova et al., 2011). Although no differences in accuracy were reported, the computerized version of the dynamic test provided more detailed information on the individual task-solving processes. In an earlier study, Tzuriel and Shamir (2002) reported that children who were assisted by a computer when taking the Children's Seriational Thinking Modifiability Test showed more cognitive change than when feedback was provided by an examiner. Such findings, in combination with the seamless learning possibilities a robot offers, and the tangible 3D-task used in this study, suggest that computerized assisted instruction provided by a robot may offer promising new possibilities for dynamic testing.
In the current study, a small, friendly tutor robot was utilised during a sequence of test sessions. We developed a Wizard of Oz setting (Dahlbäck, Jönsson, & Ahrenberg, 1993), with the examiner partially operating the robot by computer. The behaviour of the robot was preprogrammed to be adaptive to the child's responses and incorrect task-solving behaviour, based on the outcomes of former studies (e.g., . Furthermore, the robot was preprogrammed to provide oral prompts and scaffolds (individual hints based on the children's actions) when solving the inductive reasoning tasks. In addition, it was programmed to give general feedback and short instructions and interact nonverbally by, for instance, naming the child, nodding, dancing and blinking its eyes. The robot was tele-operated, so it had to be controlled by an examiner. The current study sought to investigate the potential of using this robot as an assessment tool for children when solving reasoning tasks in a dynamic testing context.
We also aimed to get a first impression of the interactions between children and the robot for the development of an optimal and authentic learning and assessment environment.

| Use of robots in education
The use of robots in education, for example, for psycho-educational testing and assessment, has been associated with several advantages.
Robots are well suited to physically and socially engage with learners and their environment, with learners showing more social behaviour beneficial for learning and increased learning gains vis-à-vis other forms of technical support that do not have a physical embodiment (Belpaeme et al., 2018). Positive effects of the use of robots compared with other forms of technical assistance have been found in the cognitive as well as the affective domain. With regard to the affective domain, recent studies in the field of education have examined the use of technology to support and increase children's motivation in classrooms (Chin et al., 2014). The presence of robots has been found to have a positive influence on children's motivation when solving cognitive tasks (André et al., 2014). A robot's movement and body gestures appear to be interesting motivators that could affect a respondent's decision-making processes (Shinozawa, Naya, Yamato, & Kogure, 2005). Furthermore, studies have shown that robots designed to express social cues positively influenced respondents' motivation to finish a task and increased their desire to spend more time with the robot (Tanaka, Cicourel, & Movellan, 2007). In various studies, robot characteristics such as appearance, mobility and animation have been shown to influence even kindergarten children's ability to learn from robotic instructions and sustain their interest in completing tasks (e.g., Brown & Howard, 2013).
In relation to the cognitive effects of robots, young children have shown their ability to learn from a peer or tutor robot in several domains, such as vocabulary performance, (second) language learning, mathematics, science, thinking skills and self-regulated learning (Chang, Lee, Chao, Wang, & Chen, 2010;Hussain, Lindh, & Shukur, 2006;Jones & Castellano, 2018;Moriguchi, Kanda, Ishiguro, Shimada, & Itakura, 2011;Movellan, Eckhardt, Virnes, & Rodriguez, 2009;Sullivan, 2008). In such studies, participants demonstrated positive and engaging interactions with the robot. André et al. (2014) showed that robots could influence children's behaviour positively when they were given mental arithmetic tasks. Several authors have reported that robot-based instruction methods could have similar effectiveness as human instructors (Brown & Howard, 2013). Reaching a similar conclusion, Serholt, Basedow, Barendregt and Obaid (2014) noted that the children in their study asked the human instructor more often for help. They also concluded that children were able to follow instructions from a robot but added that more long-term interaction between subjects and a robot would be needed for studying lasting effects. In their overview of studies in the field of early language learning, however, Kanero et al. (2018) concluded that social robots are useful in language learning but not (yet) as effective as human teachers.
The current study examined the use of a robot during multiple assessment sessions. We sought to examine the effects of receiving instruction and training by a robot on children's performance across these test sessions.

| Current study aims
In the light of the promising findings about children's engagement with robots in the classroom (e.g., André et al., 2014;Baxter et al., 2017;Belpaeme et al., 2018;Benitti, 2012;Deublein et al., 2018;Kozima & Nakagawa, 2007;Tanaka et al., 2007), we sought to examine the potential of utilizing a tutor robot in a dynamic testing setting, as a means to interact with the children and to record their performance.
We focused on four key underlying issues.
Our first task was to examine the effect of training with graduated prompts, provided by the robot, on children's inductive reasoning performance. We expected that trained children would demonstrate larger increases in pre-test-post-test progressions in accuracy and the number of correctly solved pieces of the series completion task compared with untrained children. These expectations were in accordance with findings from Resing, Xenidou-Dervou, Steijn, and Elliott (2012), Stevenson, Touw, and Resing (2011), Tzuriel and Shamir (2002), Passig et al. (2016) and Wu, Kuo and Wang (2017).
Secondly, we investigated children's need for instructions, provided by the robot, during training, and expected that the number of prompts children required during the training sessions would decrease from training 1 to training 2, indicating a learning effect (Authors, 2011). In doing so, we inspected the types of prompts provided separately (metacognitive, cognitive and modelling; . Thirdly, we examined whether training would influence children's strategy use by examining how their inductive reasoning performance changed at a behavioural level. We expected a change towards a more advanced, analytical strategy level for trained children only (Resing et al., 2012).
In addition, we explored individual differences in the progression of children as a consequence of dynamic testing. The trained children were split into groups on the basis of the number of prompts they needed during training, in combination with lower or higher pre-test scores. We explored whether the progression paths in inductive reasoning of the various groups of children were significantly different .

| Participants
Fifty-two 8-year-old children with a mean age of 96 months (SD = 7.2 months; range = 83-116 months) participated in this study.
The children, 26 girls and 26 boys, were recruited from four second and third grade classes of middle-class elementary schools, located in the western part of the Netherlands. All children were born in the Netherlands, and Dutch was the first language spoken at school and at home. The schools were selected on the basis of their willingness to participate. Prior to the study, written informed consent was obtained from the schools and parents. The testing was undertaken by three trained postgraduate students with teaching experience.
One child was not present during the administration of the second training session; his data were not included in any of the analyses. This research project was approved by the ethics board of our university.

| Design
The study employed a pre-test-training-post-test control-group design (see Table 1) with randomized blocking on the basis of children's scores on Raven's progressive matrices (Raven, Raven, & Court, 2003) administered before dynamic testing started. This blocking procedure is often used in studies with rather small experimental and control groups to assure that both groups do not differ very much with regard to an important variable of study, in this particular case the mean level of reasoning. Children's scores of the Raven test were ordered from high to low, and pairs were made of children with equal scores, etc. On the basis of this blocking procedure, administered per school and grade, children were, per pair, randomly assigned to either a dynamic test group (training condition: pre-test, training, and posttest) or a static control group (control condition: pre-test, control task, and post-test). In each school and grade, 50% of the children were allocated to the training condition; the others were assigned to the control condition. Children in both conditions were administered the pre-test and post-test of the series completion task (Sessions 1 and 4; see Table 1

| Raven's progressive matrices
The Raven's progressive matrix test (Raven et al., 2003) measures the ability to detect rules by means of induction, a prerequisite for successful inductive/serial reasoning. Each item is composed of a visual-spatial 3 × 3 matrix in which one part is missing. Children were instructed to select the missing piece from a number of alternatives.
Split-half coefficients were reported as measure of the reliability of the test (r = 0.91; Raven, Raven, & Court, 2000).
Alternative tasks X-R Note. R: the robot was available on the child's desk. a The Raven's progressive matrix test was administered in class, before dynamic testing started.

| The robot
In this study, a small table-top robot, developed by WittyWorX (2012), was utilized. The robot had an appearance similar to a wise but friendly owl (see Figure 1). It was about 20 cm tall, and could easily be placed on a child's desk. The robot was preprogrammed to speak, dance, move, show feedback with its eyes, and react to touch. Nonverbal behaviour included emotions (happy/neutral) as shown by the eyes (two colour displays), nodding or head-shaking and dancing (body movement was possible in all directions). With its sensors and expression abilities, it was expected that the robot could interact with the children playfully and hold their attention.
The robot's stand-alone abilities were not fully developed at the time of testing. As mentioned, we utilized a Wizard of Oz setting, and the examiner, quietly sitting in a corner of the room behind the child, served as the eyes and ears of the robot. For sensory input, the robot was equipped with a camera, microphones and touch sensors, so that the solving processes of the children could be filmed (only capturing hands and voice to safeguard children's anonymity). All robot behaviour (and that of the examiner) was preprogrammed by using if-then scenarios with utterances, sounds and eye/body movements. As the examiner functioned as the eyes and ears of the robot and could follow the filmed solving behaviour of the child on a laptop, she could, by pushing a button on this screen, influence the preprogrammed behaviour path of the robot, according to the fixed scenarios. The robot was programmed in such a way that it was able to interact and give feedback at the right time. Children could press the head of the robot to indicate that they were finished with a task and ready for the next one.

| Dynamic test: Series completion
We used a dynamic visual-spatial task adapted from . During the pre-test and post-test sessions, children were given series of schematic-picture completion problems that all consisted of a line of six puppet pictures printed in a booklet, followed by an empty box with a question mark (see Figure 2). They were asked to construct the seventh puppet on a white empty frame on their desk by placing eight transparent perspex pieces into the right configuration. Items could be solved by observing the systematic changes that occurred in the row and uncovering the underlying solution rule(s) . Each answer had to consist of one head, two arms, two legs and a torso comprised of three pieces. Item

| Dynamic test: Training
The 2 × 6 items used during the two training sessions were constructed at the same difficulty level and equivalent to those used during the pre-test and post-test sessions. However, children were now told at the start of the training sessions that the robot would help them find the correct puppet. The training procedure was based on graduated prompt procedures adapted from previous studies. This training procedure was developed on the basis of earlier developed process models of the specific dynamic test utilized in the current study (e.g., . This so called graduated prompts procedure provided children with prompts to help them to solve the problem. These prompts included increasingly more specific and explicit feedback of how to solve the task presented. The

| Procedure
The four test sessions took place once a week. All children were seen individually in their school during all four sessions. During the training, the robot interacted with the child and gave feedback and prompts were only used to check the quantitative data. Every step that was taken by child the (and the robot) was saved in a log file.
The robot performed all the interactions with the child during the four sessions, operating with the help of the examiner who sat silently in a corner of the room, being part of our "Wizard of Oz" constellation.
Voices and sounds uttered by the robot were actually initiated by the examiner who followed the task-solving behaviour of the child on the computer screen and had to push a button before the robot could execute the next step. The camera enabled the examiner to simultaneously analyse in detail the task-solving behaviour of children during the dynamic test sessions. The series completion items of all parts of the dynamic test were simulated on a laptop, and the examiner had to mimic the task exactly and at the very same moment as the child. 1 The robot was programmed in such a way that it was able to interact and give feedback at the right time.

| Number of prompts
We counted the number of times children received a prompt at least once for each item. The maximum number of prompts was 30 as there were six items and five types of prompts per item.

| Learner groups
The trained group of children was split into four learner groups: two groups of children that needed many versus few prompts during training, differentiated by those who had low or high pre-test scores.
Median splits were used to separate the children into the four groups.

| Behavioural strategy use
The data gathered during the dynamic test sessions were compiled into log files. The outcome variables analysed were related to accuracy, time, efficiency and task-solving behaviour. Scoring of children's behavioural strategies during the test sessions was based on the observed solution times (ST) at different stages in the tasksolving process (Kossowska & Nęcka. 1994): the initial period, which referred to the time before the first body part was placed; the middle ST, which referred to the period before the next piece was placed; and, lastly, the end ST, which referred to the total time it took children to solve the problem.
Behavioral Strategy Use ¼ InitialST þ MiddleST InitialST þ MiddleST þ EndST * 100: Higher scores on the Behavioural Strategy Use measure were thought to reflect the use of an analytical strategy as children spent relatively more time on the preparatory stage (initial and middle ST) of task-solving. Lower scores were assumed to reflect a heuristic strategy, indicating that the children took more time for the execution stage than the initial and middle stages. Children with low scores were likely to have thought more globally about what the last puppet should look like (Resing et al., 2012).

| RESULTS
Two one-way analyses of variance (ANOVAs) were conducted to examine whether there were any differences between the two treatment groups regarding their initial level of inductive reasoning and age. The analyses revealed that the two treatment groups did not significantly differ with regard to their average age ( F [1, 49] = 3.47, p = 0.07) nor their initial reasoning performance at pre-test

| Effects of training
First, children's performance on the series completion task and the effect of receiving training by the robot on their reasoning progression was analysed, regarding two outcome variables: (1) accuracy, measured as the total number of correctly constructed puppets at pre-test and post-test, and (2) total number of body parts positioned correctly at pre-test and post-test.

| Total correct
We expected that the trained children would show greater progression in reasoning accuracy than the children in the control group.
The effect of training on accuracy was examined using a repeated measures ANOVA with Condition (training/control) as betweensubjects factor and Session (pre-test/post-test) as within-subjects factor. The number of accurately solved items was the dependent variable. The change in reasoning accuracy across sessions is depicted in

| Total body parts correct
The number of body parts children had positioned correctly at pre-test and post-test was analysed with a repeated measures ANOVA with

Condition (training/control) as between-subjects factor and Session
(pre-test/post-test) as within-subjects factor. The progression in the number of correct body parts for children in both conditions is depicted in Figure 4 (and Table 2

| Prompts during training
The number of prompts children needed during training was considered to be one of the indicators of their potential for learning. Children showed large individual differences in the number of metacognitive (training 1: ranging from 0 to 11; training 2: from 0 to 12) and cognitive (training 1: from 0 to 15; training 2: from 0 to 18) prompts.
Contrary to our expectations, this did not significantly decrease from training 1 to training 2 (t[25] = 0.35, p = 0.78) as has been depicted in  Table 2).
We then investigated whether the behavioural strategy use of the children in the training and control conditions changed differently from pre-test to post-test. Another repeated measures ANOVA was conducted with Session (pre-test and post-test) as a within-subjects  factor, Condition (training and control) as a between-subjects factor, and Behavioural strategy as the dependent variable. A non-significant

| Exploring learner groups
In addition, we explored individual differences in the progression of children. The trained children were split into four learner groups: needing many versus few prompts during training, in combination with lower or higher pre-test scores. The low pre-test and low prompts group included only two children and was not included in the analysis.

| Observations
From the outset of this study, children were highly excited and motivated to work with the robot, which appeared to know every child by name. They liked the testing periods very much and were eager to work with the robot. After a short period of time, they were talking to Myro as if it was a teaching assistant, and most of the time ignored the examiner who was sitting in a corner of the room. Because the instructions provided by the robot were highly structured, they sometimes pushed it on the head and said things like: "keep your FIGURE 6 Behavioural strategy-scores at pre-test and post-test for training and control group children The total prompts, and the metacognitive, cognitive and modelling prompts [Colour figure can be viewed at wileyonlinelibrary.com] mouth shut; you have said that now too often, Myro." Their teachers also responded enthusiastically, many asking if they could play the game with the robot, so a general meeting was planned after the study ended.

| DISCUSSION
The present study focused on the potential of using a preprogrammed  Freund & Holling, 2011;Stevenson et al., 2013), our study showed that task performance generally improved when children were tested twice, but that the degree of progression varied, depending on whether or not children were dynamically trained by the robot on the task (e.g., Campione & Brown, 1987;Passig et al., 2016;Resing et al., , 2012. Children that were dynamically tested and trained by the robot showed significantly greater progression in both their accuracy of task solving and the more detailed number of correct puzzle pieces variable than children who were just statically tested by the robot. We believe that we can safely conclude that the intervention children were provided with by our friendly table-top robot led to these differences in progression because the same tasks and instructions were tested and positively evaluated in other studies (e.g., Veerbeek et al., 2019). Of course, a future study, in which a second control group solves the items used in both training sessions in a static, unguided way, would provide extra information, necessary to further confirm this conclusion. Dynamic testing research in the past with children in these three conditions already provides further support to this conclusion (e.g., Resing, 1993Resing, , 2000. Another useful direction for future studies concerns investigating the potential advantages of robot-administered as opposed to computerized or humanadministered dynamic tests to research whether robot-administered dynamic testing has benefits beyond those of human and computerized testing. Interestingly, children's progression paths increased, whereas the number of prompts they needed did not decrease from the first to the second training. This could be partially due to the difficulty level of the series-completion items; they were developed as rather diffi- The scaffolding and graduated prompts principles behind the training given by the robot were specifically designed to tap into children's zone of proximal development (Serholt & Barendregt, 2016;Vygotsky, 1978). When we explored the variation in progression in task solving in relation to the outcomes, large individual differences were detected.
Of course, the data regarding learner groups are rather speculative but, considering the small subgroups of children, are promising, and highlight the potential extra value of individualized forms of dynamic testing, in particular with computerized robot technology. In future, outcomes of an extended study will have to support these preliminary findings.
The current study shows that our dynamic training provided by a robot did also differentially influence children's behavioural strategy use as measured by the time children needed to actually start solving each task item. Unexpectedly, the trained children made less use of a more analytical strategy after training than their peers who did not receive training. The untrained children, however, appeared to use an analytical strategy more frequently during the post-test. Nevertheless, our first, global checking of the log files revealed that trained children more systematically placed the puppet blocks; they first selected little piles of equal blocks, for example, three green ones for the body of the puppet; then made a three-piece block of the body, and finally placed that 3 × 3 block on the puppet frame. Untrained children, on the contrary, frequently seemed to use quick trial-and-error behaviour or solved the puzzle piece by piece. Perhaps the unexpected findings regarding the increase in heuristic strategy use by the trained children reflects familiarity with the task, as a result of which these children required less preparation. This finding underlines that we cannot solely rely on reaction time data in relation to children's behavioural strategy use (e.g., Kossowska & Nȩcka, 1994), but, of course, future research with a larger sample size will be necessary to underline our findings and inferences with regard to children's strategy use.
Step-by-step analysis of children's task-solving sequences would be one possible option (e.g., Veerbeek et al., 2019).
The results further support our idea that subgroups can be discerned that differ on the basis of their changing strategy use, particularly in the case of the trained children, in combination with information regarding the number of prompts children need during training, and their progression in accuracy and strategy use. Findings lent further support to the idea that dynamic testing outcomes can be helpful for educational assessors because these provide interesting process information regarding inter-variability and intra-variability in children's use of strategies when learning to solve tasks.
In the current study, the robot provided prompts to the child when needed, but these were not yet optimally adaptively tailored to the, at times, very idiosyncratic mistakes that the children incidentally made during training. Further research is necessary to ensure that the robots of tomorrow provide highly sophisticated and differentiated interaction responses in assessment contexts. With regard to the cognitive domain studied here, future research should be geared to the finetuning of prompts, and dynamic scaffolds, adaptations to specific groups of children, examination of specific, systematic task-analyses, and consideration of patterns of mistakes and idiosyncratic ways of processing children show in solving cognitive tasks (e.g., Granott, 2005;Khandelwal, 2006;Renninger & Granott, 2005). In future, for example, the robot could be programmed to enable more variation and flexibility in preprogrammed scenarios in providing feedback and instruction to individual children. Although we are aware it is a challenge to realize all these requirements, these developments should provide exciting possibilities for obtaining further insight into children's differing learning paths during dynamic testing, or in relation to instruction in the classroom. Although the robot still had some obvious limitations, such as repeating instructions in exactly the same way, and the robot was operated in a "Wizard of Oz" setting, children interacted with the robot freely, for example, providing the robot with feedback, and were highly responsive and motivated to work with the robot, even after all the assessment and training sessions. The vast majority of children did not even seem to notice that the examiner was seated in the back of the room.
A particular complication of dynamic testing, in particular when individual strategy patterns and changes are the focus of assessment, is that detailed study of children's processing, including their responses to training, can easily result in an overload of information (derived from spoken, written or videotaped sources) that is too complex and time-consuming to interpret and report. A personalized robot teacher assistant would certainly help to overcome this difficulty, especially if it would be able to visually deal with the pieces of tangible materials children put on the table freely. We think this is a key and unique aspect of using robotics in psychological and educational assessment, because both the development and education of higher cognitive abilities have their origin in sensory-motor activities in young children (e.g., Timms, 2016), and the robot in combination with the material as developed perfectly match these activities. We anticipated and found that such technology can assist us in assessing and examining task-solving processes in more detail, thereby enabling us to inspect in-depth more of the information processing that takes place during the course of training, one of the key elements of process-oriented dynamic testing (Elliott, Grigorenko, & Resing, 2010;Jeltova et al., 2011;Sternberg & Grigorenko, 2002). As most empirical studies that discuss the effects of robots as teaching tools involve learning closely related to the field of robotics, our findings have significant potential and should provide further opportunities for the broader field of learning complex reasoning skills (Benitti, 2012).
We are aware that much effort in terms of both hardware and software development will be necessary for educational assessment before educational robots will be ready to assist teachers and educational psychologists in the classroom of tomorrow (e.g., Timms, 2016). We think, however, that the results of the current study reveal that even a simplified version of a real robot, as a result of its instructive teaching and patience, can stimulate children in their learning of solving complex reasoning tasks, leading to an important impact on the development of cognitive growth (Mubin et al., 2013). We noted that the children enjoyed the testing periods very much and were eager to work with the robot during all assessment sessions. It would be valuable to study whether children in the control condition also learned a lot from assessment by the robot, as they also were eager to leave the classroom for a next session with the robot. An extension of the study design with a focus on the novelty aspect of a robot-administered dynamic test will therefore be The merits of using a robot as an assistant in dynamic testing are, of course, intriguing. Earlier studies  have already highlighted the benefits of the use of an electronic console for dynamic testing. Our study replicated the potential of electronic technology for dynamic testing but also introduced the robot as a helpful coassessor, whereby the children could freely play with the tangibles, organizing and moving them. The robot has been found to be an enjoyable dynamic companion, mostly because it possessed both verbal and nonverbal interaction qualities, with-for the moment-the examiner as Wizard of Oz at the background. Earlier research also showed that the use of a preprogrammed computerized interface for offering the prompts and scaffolds has no discernible negative consequences when compared with that provided by an examiner Tzuriel & Shamir, 2002).
Because the task prompts and scaffolds remained the same over studies, we think that these earlier findings are generalizable to the outcomes of the current study. Nevertheless, it will be necessary to check the potential application of an assistant-robot assessor in comparison to both a human and a (2D) computer administration of the dynamic test, to further validate the additional value of robotadministered dynamic testing Our recommendations for future studies would be to continue to explore possibilities in the use of preprogrammed robot instructions to further reveal learning processes unfolding during dynamic testing. This would further open ways to tailored assessment of individual children's potential for learning (Clabaugh, Ragusa, Sha, & Matarić, 2015;Granott, 2005) and more sophisticated understanding of children's differential development in ways that can directly impact upon their learning.

ACKNOWLEDGEMENTS
We would like to thank Bart Dirkx and Ruus van der Aalst for their help with programming their robot-under-construction for our study and Huguette Fles, Nathalie Ijsselmuiden, and Karlijn Nigg for their help in collecting the data for this study.
NOTE 1 This procedure was explained in the informed consent letter, ethically approved by our university, and a data protection compliance protocol was followed. All data were anonymized, and video-materials destroyed after coding and controlling the data.