Exploring the impact of socially assistive robots on health and wellbeing across the lifespan: An umbrella review and Meta-analysis

Background: Socially assistive robots offer an alternate source of connection for interventions within health and social care amidst a landscape of technological advancement and reduced staff capacity. There is a need to summarise the available systematic reviews on the health and wellbeing impacts to evaluate effectiveness, explore potential moderators and mediators, and identify recommendations for future research and practice. Objective: To explore the effect of socially assistive robots within health and social care on psychosocial, behavioural, and physiological health and wellbeing outcomes across the lifespan (International Prospective Register of Systematic Reviews (PROSPERO) registration: ( blinded for peer review ). Design: An umbrella review utilising meta-analysis, narrative synthesis, and vote counting by direction of effect. Methods: 14 databases were searched (ProQuest Health Research Premium collection, Scopus, PubMed, Web of Science, ASM Digital Library, IEEE Xplore, Cochrane Reviews, and EPISTEMONIKOS) from 2005 to May 4, 2023. Systematic reviews including the effects of socially assistive robots on health outcomes were included and a pooled meta-analysis, vote counting by direction of effect, and narrative synthesis were applied. The second version of A MeaSurement Tool to Assess systematic Reviews (AMSTAR-2) was applied to assess quality of included reviews. Results: 35 reviews were identified, most focusing on older adults with or without dementia (n = 24). Pooled meta-analysis indicated no effect of socially assistive robots on quality of life (standard mean difference (SMD) = 0.43), anxiety (SMD = -0.02), or depression (SMD = 0.21), although vote counting identified significant improvements in social interaction, mood, positive affect, loneliness, stress, and pain across the lifespan, and narrative synthesis


Introduction
Digital solutions are reshaping human interactions; both in terms of human-human and human-machine interaction.This new landscape also affects the concepts of social connection and belonging, as these human needs are now located in a virtual and technological space which do not necessarily require physical interaction.It has been argued that humans evolved with a tendency towards social connection, facilitated by genetic, neural, and hormonal changes that promote bonding and companionship (1).According to Maslow's theory, achieving higher levels of self-actualisation in life means building a strong foundation of physical and psychological safety, followed by social belonging (2).If a higher need is compromised, the previous ones are also at risk.For example, if social connection or belonging are compromised, physical and psychological safety and basic needs are in danger too.Empirical evidence demonstrates the consistency of these theoretical assumptions: lack of social connection is associated with negative health outcomes across the lifespan including depression and anxiety in older adults (3), dementia risk (4), and physiological changes including disturbed sleep, increased body weight, and reduced immune functioning (5).Thus, non-pharmacological interventions often utilise social connection such as befriending (6), although this requires resources to coordinate and face-to-face interaction is not always feasible (7).The argument for interventions to increase social connection is aligned to the recent global drives towards person-centred care which encourages a holistic approach to health and wellbeing (8) and provides a theoretical and empirical foundation to promote health and wellbeing from a wider perspective rather than through pharmacological intervention only.Within health and social care, the nursing profession in particular has long reflected such a holistic approach to the health and wellbeing of patients driven by the provision of care and by creating caring relationship between the nurse and the patient (9).

J o u r n a l P r e -p r o o f
Journal Pre-proof Technological advancement allows for alternative sources of social connection (10).Socially assistive robots (SARs), often resembling a human or pet, work with the help of AI-supported software (or a similar formulation) and are able to interact with humans and form social relationships through a variety of sensors, without the need to touch (11).SARs may provide a potentially cost-effective intervention to promote wellbeing within health and social care settings which acknowledges already limited staff capacity.Whilst SARs were initially perceived as the antithesis of the quality of care provided by nurses (12), perceptions are slowly moving towards SARs as a potential tool in nursing care to provide companionship (13).SARs have been widely applied across the lifespan, although existing research mainly clusters around children and older adults (14).Namely, SARs have been reported to improve independent living, depression and social isolation in older adults (14,15), and demonstrate effectiveness in managing distress and pain management in children with medical conditions such as cancer (16,17) or during medical procedures such as venipuncture (18) or intravenous insertion (19).Although intuitively the mechanism of effect of SARs during medical procedures may be distraction, a pilot study of intravenous insertion in paediatric patients found SARs to be effective in reducing pain only when programmed to display empathy in comparison to distraction behaviour (19).Indeed, within other health and social care contexts across the lifespan, observations suggest that SARs exert positive health and wellbeing outcomes through an empathic relationship and emotional bond (10,14,20,21), much like with a human or animal.
To date, there is no comparison of the effectiveness of SARs across the lifespan, as primary studies and existing reviews have largely been conducted in silos according to children and adolescents or older adults.Thus, it is uncertain whether the aforementioned clusters of findings related to effectiveness represent distinct mechanisms of and resulting outcomes according to age group, or rather a dichotomy of research areas that are different across age J o u r n a l P r e -p r o o f Journal Pre-proof groups.For example, it is uncertain whether SARs only reduce anxiety in children, or whether studies that measure the effectiveness of SARs on anxiety levels focus on children only.Given the vast amount of existing literature on SARs within a range of settings and populations, there is a need to synthesise the literature to identify when SARs are effective for which individuals and for which outcomes, through comparison of outcomes, subpopulations, and types of socially assistive robot, of which an umbrella review methodology allows (22).An umbrella review also takes into account methodological quality of included primary studies and reviews, allowing findings to be weighted accordingly (22).
Consequently, umbrella reviews have been previously applied to amass evidence on effectiveness of technological interventions on health outcomes (23)(24)(25), although this methodology has not yet been applied to assess SARs.

Research objectives
Based on the existing literature, the aims of the current umbrella review were to investigate the effectiveness of SARs in improving the health and wellbeing of service users within health and social care contexts, compare their effectiveness with standard care and other available therapies on health and wellbeing outcomes within health and social care contexts, and investigate the interactions between outcomes and other factors as moderators or mediators of any identified effects, including sub-population.Thus, the primary outcomes were the psychosocial, behavioural, and physiological effects on health and wellbeing, and the secondary outcomes were any moderators or mediators of these effects or their interaction with each other.Reporting of subsequent methods adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (see Supplementary Material 1).

J o u r n a l P r e -p r o o f
Journal Pre-proof

Protocol and registration
The protocol was pre-registered prior to the search via the International Prospective Register of Systematic Reviews (PROSPERO, registration number: (blinded for peer review).Deviations from this protocol are outlined within the section 'Deviations from the registered protocol' alongside justifications.Searches of the International Prospective Register of Systematic Reviews (PROSPERO), Joanna Briggs Institute Registries, and Open Science Framework confirmed that there were no existing protocols for an umbrella review with a similar scope.

Eligibility criteria
Inclusion criteria was formed around the PICOS (population, intervention, comparator, outcome, study type/setting) criteria (26).Any recipients of the intervention including all ages of the population were included, whilst reviews that reported effects on those not directly targeted by the intervention such as staff, parents, and guardians were excluded.The primary included outcomes were psychosocial (e.g.isolation, depression), behavioural (e.g.sleep, medication adherence), and physiological (e.g.pain, stress) effects on health and wellbeing, the latter included as an objective indicator of psychosocial and behavioural effects.Thus, reviews that only included non-health related psychological outcomes (including memory, cognitive functioning and learning), health related knowledge (e.g.Type 1 diabetes knowledge), emotions that do not directly relate to health (e.g.anger), and satisfaction with the socially assistive robot as outcomes were excluded.The secondary outcomes, namely moderators or mediators of the effect of SARs on health and wellbeing outcomes, or of interactions between outcomes, did not inform the inclusion criteria.Studies including all types of comparator groups were included; there were no exclusion criteria based on comparator.Included interventions were SARs with or without pharmacological

J o u r n a l P r e -p r o o f
Journal Pre-proof support, defined as artificial intelligence in an embodied and physical form that is able to receive information from humans and respond accordingly, thus excluding virtual reality and voice activated speakers.Inclusion criteria for study type and setting were systematic reviews with or without meta-analysis across any country, of studies within health and social care settings.Criteria for systematic reviews were fulfilment at least two of the following; 1) prespecified eligibility criteria (e.g registration on the International Prospective Register of Systematic Reviews (PROSPERO) or protocol), 2) systematic methods (i.e PRISMA) that are explicit and reproducible, and 3) risk of bias or quality assessment.Reviews including qualitative studies were excluded unless the results of quantitative studies were presented separately.Where reviews include studies from outside of health and social care contexts, only the findings from within health and social care contexts were included.Conference abstracts were excluded as insufficient reporting would not allow for full data extraction relating to study characteristics, findings, quality, and overlap (27).

Search strategy
Scoping searches were completed prior to the review search to develop the preliminary search strategy (Table 1), which was peer-reviewed by a librarian at Northumbria University using the Peer Review of Electronic Search Strategies checklist (PRESS) (28) Reviews within the Cochrane library, and EPISTEMONIKOS.Given that no primary literature on SARs has been identified before 2005, the search as applied across all databases J o u r n a l P r e -p r o o f Journal Pre-proof was limited to 2005 onwards.Only reviews published in English were included, given that the primary reviewers are English language speakers with no translation services available.
The overall search strategy was modified accordingly to fit each database's requirements (see Supplementary Material 2 for the exact search applied for each database).Additionally, forward and backward citation searching of included reviews, and colleagues and other academics provided further sources of reviews.As SARs are an increasingly emergent topic, multiple sources were employed to identify grey literature, including hand searching, OpenGrey, Google Scholar, and preprint servers such as MedRxiv.

Selection of reviews
Search results were first exported via a RIS file into Endnote, where duplicates were recorded and removed.Remaining articles were uploaded onto Covidence (a systematic review management tool) (29) where they were screened against the pre-defined inclusion table by two independent reviewers (blinded for peer review), first for title and abstract, then for full text.For both stages, inter-rater reliability was calculated using Cohen's kappa statistic (30) and assessed using the conservative parameters from Altman (31).Any discrepancies were resolved through discussion and consultation with a third reviewer if necessary.Screening and resolution of discrepancies were conducted and logged via Covidence.

Data extraction and assessment of methodological quality
The data extraction form (Supplementary Material 3) was built using Cochrane guidance (32) and tailored in accordance with the research question and methodology.Extracted data

J o u r n a l P r e -p r o o f
Journal Pre-proof included information about the review, characteristics and findings of included primary studies, reports of any meta-analyses or reasoning for not conducting one, indications of included primary study and review quality, and record of any correspondence required and received from review authors.Again, data extraction was performed in duplicate by two independent reviewers (blinded for peer review), with any disagreements resolved through discussion.Only data relevant to the aims of the current umbrella review were extracted from included reviews.
Review quality was assessed using the AMSTAR (A MeaSurement Tool to Assess systematic Reviews) 2 (33) and performed in duplicate by two independent reviewers (blinded for peer review).Again, any discrepancies were resolved through discussion or consultation with a third reviewer (blinded for peer review), if required.The AMSTAR 2 checklist assesses whether the review set a clear research question with a pre-registered protocol and clearly defined inclusion criteria, whether a comprehensive search strategy was employed with independent study selection and extraction by a second reviewer, key information was clearly described, and an appropriate method was selected and applied to synthesise evidence which assessed and accounted for risk of bias.Overall score was calculated using the following criteria; no = -2, yes = 2, and partial yes = 1.Results of AMSTAR 2 assessments were used to inform confidence in included reviews through application of scoring criteria previously applied within a similar umbrella review (23).Namely, AMSTAR 2 items are divided into six critical weaknesses (protocol registration, adequacy of search strategy, risk-of-bias assessment, appropriateness of meta-analytical methods, use of risk of bias during interpretation, and assessment of publication bias) and 10 non-critical weaknesses (the remaining items).Reviews were categorised into high (<3 non-critical weaknesses, 0 critical weaknesses) moderate (<3 non-critical weaknesses 1 critical weakness), low (<3 non-critical weaknesses, >1 critical weakness), or critically low (>3 non-critical weaknesses, >1 critical

J o u r n a l P r e -p r o o f
Journal Pre-proof weakness) confidence.Although quality of included reviews was taken into account when interpreting the results, no reviews were excluded based on quality, retaining a comprehensive overview.

Deviations from the registered protocol
There were no deviations from the registered protocol.

Strategy for data synthesis
Overlap was calculated using the formula for corrected covered area (34), and categorised into slight (0-5%), moderate (6-10%), high (11-15%), or very high (>15%) overlap.Only relevant primary studies to the research question were entered into the overlap table, and one overall overlap statistic was calculated given that included primary studies ranged across the lifespan.
Given the low number of included reviews that applied a meta-analysis, narrative synthesis was applied to synthesise findings of included reviews where a meta-analysis was not conducted.This was assisted by vote counting by direction of effect (35), tested for significance using a two-tailed binomial test with the null assumption of positive effects at a 50% proportion (35).It is noteworthy that vote counting by direction of effect rather than significance was dependent on the reporting of reviews, as the primary study could not be included if reviews reported no significant differences without providing direction of effect.
Secondary outcomes of moderators and mediators of the effect of SARs on health and wellbeing outcomes, or of interactions between outcomes, were also reported.
A pooled meta-analysis was applied, including the results of primary studies within the included meta-analyses.A meta-meta-analysis was judged to be inappropriate for two

J o u r n a l P r e -p r o o f
Journal Pre-proof reasons.Firstly, there was a high proportion of overlap amongst the included primary studies within the included meta-analyses, such that a meta-meta-analysis would inflate the effects of the duplicated primary studies (36).Secondly, during data extraction it was apparent there were major inconsistencies in the reporting of values for primary studies.Namely, the results for the control and intervention groups were often switched, and some values were reported as negative values when the result reported by the primary study was positive, inconsistencies that became apparent due to the variability in reporting of values for the same primary study (even when accounting for multiple timepoints of primary studies).Subsequently, data was extracted from the primary studies to ensure accurate extraction.Nonetheless, a sensitivity analysis was performed using the meta-meta-analysis to demonstrate how the outcome may be affected by the overlap and misreporting.A random effects model was applied for all pooled meta-analyses.
The total effect and heterogeneity indexes were estimated using SPSS v28 (37).Randomeffect estimates were based on the effect size calculation from each study and the total estimate of the effect size.In detail, the Profile Likelihood (PL) random-effects model was adopted to estimate the overall effect and to include an estimation of heterogeneity in the weighting (38).
Heterogeneity was determined using the Q-statistic in the χ2 distribution and the p-value (39): a significant p-value indicated that heterogeneity could affect the results.Heterogeneity was further assessed by calculating the I2 statistic (40).According to the Cochrane standards, heterogeneity is not important if I2 ranges from 0% to 40%, moderate from 30% to 60%, substantial from 50% to 90% and considerable from 75% to 100% (41).The τ2 statistic was also determined, as an estimate of the amount of variation between the included studies.
A funnel plot was included to visually represent the estimation of the treatment's effect in the studies included in a meta-analysis and to represent a possible publication bias: if a

J o u r n a l P r e -p r o o f
Journal Pre-proof publication bias exists, the funnel plot is affected by an asymmetrical appearance and the meta-analysis could overestimate the treatment's effect (42).
Findings of the quantitative analyses and vote counting and narrative synthesis are presented in terms of behavioural, psychosocial, or physiological outcomes.Finally, a figure was created to summarise the strength of the available evidence to support the effect of SARs on each outcome, adhering to good practice for umbrella reviews (43).Findings were triangulated from pooled meta-analysis, meta-analysis of included reviews, vote counting by direction of effect, and narrative synthesis where required.In the absence of standardised guidance (43), each outcome was rated according to amount (well researched (>10 studies), moderately researched (7-9 studies), or insufficiently researched (<7 studies)), direction (positive, negative, or no overall effect), and consistency (consistent corroborated by quantitative analysis (meta-analysis or pooled meta-analysis), consistent across individual studies (vote counting only), mostly consistent, or inconsistent) of evidence to give an overall assessment of 'strong positive' (moderately or well researched, positive, and consistent corroborated by quantitative analysis), 'weak positive' (moderately or well researched, positive, and consistent across individual studies or mostly consistent), 'suggestive positive' (insufficiently researched, positive, and consistent across individual studies or mostly consistent), or 'non-significant' (no overall effect, inconsistent).

Journal Pre-proof
The selection process is displayed within the PRISMA (44) diagram (Figure 1).The search identified a total number of 9936 articles, of which 35 reviews were included for analysis in the current umbrella review (see Supplementary Material 4 for references and reason for exclusion for articles excluded based on full text), with ten eligible for quantitative analysis.
The main characteristics of included reviews are shown in Supplementary Material 5.The scope of most included reviews were older adults (n = 24), with most of those further specifying people with dementia (n = 13).Only five focused on children (45)(46)(47)(48)(49), and six either focused on adults or did not specify inclusion criteria based on population.Seven reviews focused on animal-like SARs (50-56) (two specifically on PARO (55,56)) and three on humanoid SARs (46,57,58) (one of each specifically on Telenoid (57) and NAO ( 58)).
Of those reviews that did not exclude studies based on type of socially assistive robot, PARO was the most common socially assistive robot within the literature, followed by NAO (see Table 2 for a description of the most commonly utilised SARs by primary studies).SARs acted as a companion, distraction, educator, or motivator.Most commonly, SARs adopted a companion role to older adults or hospitalised children during their stay.However, SARs were also utilised as a distraction for children during medical procedures such as IV insertion (48,59), as educators of disease management such as type 1 diabetes to children (46,58), and motivators to encourage exercise (60,61), healthy habits (58), or rehabilitation (58).
Frequency of SARs interventions ranged from a single session (45,46,(49)(50)(51)(62)(63)(64) to five times weekly (59,(65)(66)(67) and over a 15-week period (68), duration from 3 minutes (45, 67) to being there constantly (63,69), both group and individual formats.Most timepoints of measurement were immediately after exposure, although some follow-ups occurred as long as 8 months (50) after the start of the intervention.SARs interventions that included a control group were most commonly compared to usual care, although other control groups included usual activities such as reading, music, and art therapy, a live dog, and plush toys or SARs

J o u r n a l P r e -p r o o f
Journal Pre-proof with their batteries removed.Psychosocial outcomes were the most common outcomes discussed by reviews, specifically depression, anxiety, agitation, and quality of life.Reviews focusing on children were more likely to report on pain and distress, and reviews focused on older adults more likely to discuss agitation, loneliness, apathy, and neuropsychiatric symptoms.11 reviews only included RCTs (52,53,55,56,59,65,(70)(71)(72)(73)(74).Searches ran from inception to August 2022 (70,72), with sample sizes of included primary studies ranging from 2 to 455.Primary studies were published between 2001 and 2021 and conducted across a range of high-income countries including the US, Australia, Japan, Italy, Netherlands, New Zealand, and Denmark.

Quality of included evidence
Interrater reliability of quality appraisal was moderate (46.22%).After a consensus was reached through discussion, quality appraisal scores ranged from -26 (the minimum for a systematic review with no meta-analysis) to 13 (see Table 3 for scoring of individual items), with only ten reviews achieving more AMSTAR-2 checklist items than not (a positive overall score).Most reviews conducted independent screening (85%), although independent extraction was less frequent (47%).No reviews reported on the funding source of included primary studies, and only one provided a list of excluded studies based on full text screening or explained their inclusion criteria for study design.Very few reviews used a comprehensive literature search (15%), with many reviews scored negatively for not explaining language restrictions.Also, few reviews considered the influence of risk of bias when analysing (29%) or discussing (26%) results, with the few reviews that did achieve this item excluding studies based on risk of bias.Subsequently, confidence in all included reviews was rated as critically low, aside from four which achieved low confidence (49,62,67,75), and one which achieved moderate confidence (69).

Study overlap
Corrected covered area showed only slight overlap across primary studies (2.9%).However, overlap was particularly high within included meta-analyses, likely as meta-analyses were only performed within the sub population of older adults.For example, 10 meta-analyses for depression had been performed utilising a total of 10 primary studies.

Psychosocial outcomes
All pooled meta-analyses included older adults living with or without dementia and related to psychosocial outcomes.Data on 469 individuals pooled from ten primary studies (Figure 2)
Again, meta-analysis results indicate one primary study (Moyle et al., 2013, in Supplementary Material 6) to be questionable.Asymmetrical funnel plots for depression, anxiety, and quality of life indicate potential publication bias (See Figure 5).
For outcomes not suitable for pooled meta-analyses, results of each meta-analysis are provided in Supplementary Material 7. Meta-analyses consistently reported small but significant effects of SARs on social interaction and loneliness, and most reviews reported a small significant effect on positive affect.Meta-analyses did not support an effect on apathy or neuropsychiatric symptoms, and there were mixed results regarding a significant effect on agitation.

Journal Pre-proof
The findings from vote counting by direction of effect are shown in Table 4. Loneliness, social interaction, mood, and positive affect all showed strong evidence to support an effect, regardless of effect size.Furthermore, the effect of SARs on anxiety was significant, although a minority of primary studies reported an increase in anxiety after exposure to SARs.
Meanwhile, the evidence for depression, agitation, neuropsychiatric symptoms, apathy, and quality of life was inconsistent, indicating a weak level of evidence to support the effect of SARs on these outcomes.Results for motivation, and knowledge of one's disease were promising but insufficient to reach statistical significance.Assessment of non-significant findings reported without detail of the direction of effect support vote counting results, namely that there are numerous non-significant findings of primary studies of effects on quality of life, neuropsychiatric symptoms, and agitation.

Physiological outcomes
Whilst pain showed strong evidence to support an effect of SARs, the little available evidence indicated an effect on distress although was insufficient to reach statistical significance.Although the findings from vote counting indicated strong evidence for SARs in decreasing self-reported stress, there was insufficient evidence to support the efficacy of SARs in reducing biochemical indicators of stress.There was too little evidence to apply vote counting to Body Mass Index although the available evidence indicated no significant effect (62,74).

Behavioural outcomes
Sleep, medication use, and activity all showed inconsistent evidence and thus indicated weak support for the effect of SARs on behavioural outcomes.Adherence to medication possessed too little evidence to apply vote counting although indicated no significant effect (74), and one primary study reported an increase in activities of daily living (75).Although the direction of effect was not provided, numerous non-significant effects on number of

J o u r n a l P r e -p r o o f
Journal Pre-proof medications were reported.As a secondary outcome, one review reported on four studies that identified an improvement in disease management (46).Triangulation from vote counting and quantitative analysis findings are displayed in Table 4.

Results according to quality of included reviews and population subgroups
There was a trend within one review that lower quality studies were more likely to report an effect regarding agitation, anxiety, depression, and quality of life (67).No other reviews assessed for an effect of quality of primary studies.Although some effects including reduced pain were retained across different measurement tools (59), one review reported a significant effect on agitation only through video observation and not using the Cohen-Mansfield Agitation Inventory Short Form (53).Similarly, larger effects on depression were observed when the Cornell Scale for Depression in Dementia was applied within adults with dementia, (71) and when the Geriatric Depression Scale was applied within adults without dementia, (69), although both reviews reported no effect at follow-up (69,71).
Whilst SARs reduced the need for pain and psychotropic medication (51,56,68,74,75), no effect was found for medication for sleep (51,56,74), depression (51,56,74), or dementia (75).Similarly, whilst there was evidence that SARs reduced the neuropsychiatric symptoms of delusions (50,58) and irrational beliefs, SARs also reportedly increased hallucinations (64,68).Reviews reported on different aspects relating to sleep suggested a significant decrease in hours of daytime sleep (56,65) and nighttime activity (53,56,70,73,75), although a decrease in daytime activity too (53), perhaps explaining mixed results identified within vote counting.Regarding biochemical indicators of stress, one primary study reporting a decrease in pulse oximetry and pulse rate was identified as questionable within quantitative analysis of independent outcomes (see 'results of quantitative analysis).

J o u r n a l P r e -p r o o f Journal Pre-proof
There was some evidence of a greater reduction in depression for adults without dementia compared to those with dementia from the review with the lowest risk of bias (69), possibly linked to the findings that dementia (58) and behaviour or mood problems (62) negatively influenced socially assistive robot interactions.However, dementia measurements were either self-reported or rated by intervention staff, which could sometimes conflict (75).There were some instances of SARs as most effective for those with the most severe symptoms, including for loneliness (60) and psychotropic medication use for those with severe rather than mild or moderate dementia (56).Whilst there was no effect of SARs and anxiety overall, all reviews within children specifically reported a reduction in anxiety (47)(48)(49), which were mostly significant.Contrastingly, the reduction in pain was identified both within samples of children (48,49,59,62) and as reducing neuropathic pain in adults (59).Regarding cultural differences as a moderator of engagement with SARs, Italian (46) and Japanese (62) children were more engaged with SARs than Dutch (46) and Serbian (62) children respectively, and engagement was amplified for American older adults (60).
SARs encouraged social interaction with the socially assistive robot itself (50), but also other participants (50,60,(62)(63)(64), setting staff (48,63,64,69,73), family and friends (48,60), and the deliverer of the intervention (48,50,60).Whilst some reviews reported no effect of SARs interventions delivered as a group compared to individually (53,75) including on the effects of SARs on depression (76), group interventions were more effective for reducing anxiety in people with dementia (76), the only study in one review that found direction of an effect on depression sampled women living alone (78), and the presence of a parent reduced pain (48,49) and anxiety (48) compared to the child interacting with the socially assistive robot alone.
One confounder to this potential moderator is that adherence to the socially assistive intervention was lower when using PARO at home (67).Few reviews reported effect of type of socially assistive robot , although one reported no effect on depression and anxiety

J o u r n a l P r e -p r o o f
Journal Pre-proof outcomes (76).Whilst length of each session and total weekly exposure time to the socially assistive robot positively predicted a reduction in depression, frequency of sessions and overall duration of the intervention negatively impacted depression scores (73), indicating longer and fewer sessions as optimal.In contrast, one review suggested two to three 15-20 minute sessions weekly to avoid participants losing focus and as similar levels of interaction were reported across differing numbers of sessions (57).Duration may also be a confounding factor, given that one review reported that sessions for children tended to be longer than sessions for older adults with dementia (48).
There was generally inconsistency in what was described as usual care within primary studies (76), making comparison difficult.When compared to a live dog, SARs generally showed no difference in their effectiveness on agitation, depression, quality of life (54), or anxiety (49).
When compared to virtual reality, SARs were reported as more effective for reducing anxiety but not depression, of which there was no significant difference (77).Perhaps explicably, the relationship built with a socially assistive robot was stronger than with a virtual avatar, which in turn predicted increased use (46,48).For reviews that compared the effectiveness of SARs separately for comparison groups, reviews reported improvements in health outcomes compared to usual care (51,68,79) but not a plush toy (51,68,79) or live dog (50).Perhaps relatedly, literature was mixed as to whether SARs elicited increased engagement compared to a plush toy (48,67,79) or live dog (51,62,73) or no difference compared to a plush toy (51,60,62) or live dog (50,67), with a minority of evidence to suggest increased engagement compared to usual care (51).

J o u r n a l P r e -p r o o f
Journal Pre-proof Technical issues were a barrier to SARs interventions (57,67,69), including issues with movement (57), SARs perceived as childish or making strange noises (55,67), and internet connection and sufficient training to operate socially assistive robot as necessary (57).Furthermore, some reviews reported issues with accessibility (57,58,69), including SARs as less appropriate for those with intellectual disability and hearing or vision impairment (58).

Discussion
A summary of strength of evidence for each outcome is displayed in Table 5.There is mixed evidence surrounding psychosocial outcomes; whilst there is promising evidence to suggest an effect of SARs on loneliness, social interaction, mood, and positive affect, more research is needed to validate effects on motivation, and knowledge of one's disease.The available evidence indicates no effect on depression, quality of life, agitation, apathy, or neuropsychiatric symptoms.Whilst quantitative analysis suggests no effect on anxiety, comparison across sample groups indicate an effect of SARs on anxiety in children but not older adults.The evidence for physiological outcomes is promising specifically for pain and stress, although more research is required to validate effects particularly for distress and biochemical indicators of stress such as pulse oximetry and cortisol.There is limited evidence to support an effect of SARs on behavioural outcomes, indicating no effect on sleep, medication use, and activity.
Overall, the effects of SARs observed relate to non-psychiatric outcomes including positive affect, loneliness, and mood, and an inability of SARs to treat psychiatric symptoms.This pattern is generally reflected in the available literature on animal-assisted therapy (80), with those finding an improvement in psychiatric symptoms may be attributed to a holistic effect of animal-assisted therapy within nature (81).Although the effect of animal-assisted therapy on loneliness is more mixed than for SARs (80,82), the positive impact of both SARs and

J o u r n a l P r e -p r o o f
Journal Pre-proof animal-assisted therapy on loneliness is not mediated by attachment (82).As SARs and animal-assisted therapies exert similar effects through similar mechanisms, the decision of intervention should be dependent on their relevant costs.It is also noteworthy that primary studies to support both interventions are inherently biased by virtue that only participants who like, accept, or are not fearful of the socially assistive robot (50,55) or animal (81) are included.This is a particular issue for SARs given that some reviews identified a lack of acceptability (63), including that some residents showed fear or disgust that did not dissipate with repeated interaction (57).Furthermore, given that similar effects were observed between SARs and plush toys, it is questionable whether an interaction with participants is required at all to deliver the same effects.Thus, a cost-benefit analysis is needed to explore whether the significant but small in magnitude effects of SARs justify their price.
Findings also suggest an influence of the presence of dementia and other psychological disorders, possibly mediating or moderating the effect of stive robots on health and wellbeing.This is particularly noteworthy given that most of the available literature within adults includes people with dementia.It is conceivable that cognitive impairment may predict different interactions with SARs, and a distortion of reality observed in people with dementia may affect the 'uncanny valley' -by which uncomfortable feelings arise when interacting with something resembling but not quite reaching a human form.Whilst there is no available literature on 'uncanny valley' within cognitive impairment, people with Autism Spectrum Disorder have been found to show a higher threshold to its effect (83).Thus, further exploration is needed on the effect of SARs within adults without cognitive impairment.
Although a mechanism of SARs on mood related outcomes was not formally explored by included reviews, findings suggest that SARs stimulate social interaction, either with staff, other participants, or parents of children.The established positive effects of social interaction on wellbeing (84) supports social interaction as a feasible mediator.However, SARs have

J o u r n a l P r e -p r o o f
Journal Pre-proof also been reported as effective for increasing wellbeing in individual settings, where the socially assistive robot may act as the companion, rather than stimulating companionship with others within group sessions.This would align with literature that finds attachment to SARs does not account for the improvement in loneliness in nursing homes (82).
Subsequently, a mediation analysis would be useful to confirm this suggested mechanism in group settings, and further explore the mechanism within individual interventions.

Strengths and limitations
This umbrella review possesses several strengths and limitations.No additional studies were added after extensive searching of other sources outside of the search, indicating a comprehensive search strategy.An umbrella review methodology allowed us to assess the current literature in this area from a high-level perspective, identifying the copious amounts of reviews within older adults which have overstated the low number of primary studies, and errors in reporting these primary studies that could significantly mislead their data analyses and subsequent interpretation.The errors identified highlight an urgent need for improved accuracy of reporting of primary studies, raising an important issue of building in error within reviews and umbrella reviews that indicate a need for vigilance.Limitations that affect the generalisability of findings include the English language limitation of the search and that all primary evidence is from high income countries.The findings of the current review are also limited by the considerable risk of bias of included reviews and the overlap of primary studies across meta-analyses, as reviews particularly within older adults do not add anything to existing knowledge.Additionally, quantitative analysis identified one questionable primary study that should be taken with caution when interpreting findings.

J o u r n a l P r e -p r o o f
Journal Pre-proof

Conclusion
SARs are effective on mood related outcomes and encourage social interaction, although the magnitude of these effects is small and may be comparable to considerably more costeffective plush toys.Within group settings, positive effects appear to be exerted through an increase in social interaction with staff, friends and family, and other participants, although a formal mediation analysis is required to confirm this.Thus, the main mechanism of SARs appears not to replace social connection, but to encourage it amongst other humans.Future research would first benefit from agreeing a consensus definition of SARs to ensure consistent comparison.More research is required to explore the effect of SARs on physiological outcomes, and the evidence of SARs on behavioural outcomes is generally weak.Future research should therefore quantitatively assess the overall effect of SARs on physiological outcomes through meta-analysis, as the current magnitude of the identified positive effects aggregated across primary studies is uncertain.More broadly, the current umbrella review identified that although the literature on pain, anxiety, and distress in children is promising, a meta-analysis is needed to explore the magnitude of effects.Robotic soft and colourful bear resembling a plush toy.Able to move and talk.Table 3: AMSTAR quality ratings and an overall quality judgment for each review.
J o u r n a l P r e -p r o o f Journal Pre-proof Moerman 2019 Q1: Did the research questions and inclusion criteria for the review include the components of PICO?Q2: Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocolreview?Q4: Did the review authors use a comprehensive literature search strategy Q5: Did the review authors perform study selection in duplicate?Q6: Did the review authors perform data extraction in duplicate?Q7: Did the review authors provide a list of excluded studies and justify the exclusions?Q8: Did the review authors describe the included studies in adequate detail?Q9: Did the review authors use a satisfactory technique for assessing the risk of bias in individual studies that were included in the review?Q10: Did the review authors report on the sources of funding for the studies included in the review?Q11: If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results?Q12: If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?Q13: Did the review authors account for RoB in individual studies when interpreting/discussing the results of the review?Q14: Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?Q15: f they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review?Q16: Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review?
J o u r n a l P r e -p r o o f Journal Pre-proof

Figure 1 :
Figure 1: PRISMA diagram delineating the article selection process for the current systematic review.

Figure 2 :
Figure 2: Forest plot for pooled meta-analysis for the outcome depression.

Figure 3 :
Figure 3: Forest plot for pooled meta-analysis for the outcome Anxiety.SeR = self-reported anxiety, StR = staff reported anxiety, PT = when compared to a plush toy, UC = when compared to usual care.

Figure 4 :
Figure 4: Forest plots for pooled meta-analysis for the outcome Quality of life.SeR = self-reported anxiety, StR = staff reported anxiety.C = when compared to the control group, D = when compared to a live dog.

Figure 5 :
Figure 5: Funnel plots for pooled meta-analysis for the outcomes depression, anxiety, and quality of life.

Table 1 :
Search strategy

Table 2 :
Description of the most commonly included socially-assistive robots.

Table 4 :
Vote counting by direction of effect rather than statistical significance, where reported within included reviews.Code for describing 'strength of evidence': all (or above 90%) in favour and significant; Strong, most in favour and significant; Moderate, all in favour but non-significant; insufficient evidence, highly mixed; Inconsistent findings.