A systematic review evaluating the implementation of technologies to assess, monitor and treat neurodevelopmental disorders: A map of the current evidence

(cid:129) Technology has been used clinically across multiple neurodevelopmental disorders. (cid:129) Many studies show clinical e ﬀ ectiveness, but some have poor quality ratings. (cid:129) Both healthcare and service users hold largely positive views about technology. (cid:129) More research on service delivery e ﬃ ciencies and cost-savings is required. (cid:129) More collaboration between clinicians, academics, patients and industry is needed. ed, 808 full-texts were screened, resulting in 47 included papers. These studies were appraised and synthesised according to the following outcomes of interest: e ﬀ ectiveness (clinical e ﬀ ectiveness/ service delivery e ﬃ ciencies), economic impact, and user impact (acceptability/ feasibility). The ﬁ ndings describe how technology is currently being utilised clinically, highlights gaps in knowledge, and discusses future research needs. Technology has been used to facilitate assessment and treatment across multiple NDD, especially Autism Spectrum (ASD) and attention-de ﬁ cit/hyperactivity (ADHD) disorders. Technologies include mobile apps/tablets, robots, gaming, computerised tests, videos, and virtual reality. The outcomes presented largely focus on the clinical e ﬀ ectiveness of the technology, with approximately half the papers demonstrating some degree of e ﬀ ectiveness, however, the methodological quality of many studies is limited. Further research should focus on randomised controlled trial designs with longer follow-up periods, incorporating an economic eva- luation, as well as qualitative studies including process evaluations and user impact.


Introduction
There are substantial differences in the prevalence of neurodevelopmental disorders (NDDs) across countries, with rates being generally lower in Europe/United Kingdom (UK) than in the United States of America (USA) (Cleaton & Kirby, 2018). NDDs are a group of conditions which have their origins during the early stages of child development, although they are often lifelong conditions. For the purposes of this https://doi.org/10.1016/j.cpr.2020.101870 Received 27 November 2019; Received in revised form 5 March 2020; Accepted 24 May 2020 review, NDDs will be defined according to Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association [APA], 2013) criteria as being characterized by developmental deficits that impair personal, social, academic or occupational functioning. The most common NDDs include attention-deficit/hyperactivity disorder (ADHD), specific learning disorders, impairments affecting speech, language and communication, autism spectrum disorder (ASD) and specific and moderate learning difficulties. NDDs often frequently co-occur with one another (Gargaro, Rinehart, Bradshaw, Tonge, & Sheppard, 2011;Kalyva, Kyriazi, Vargiami, & Zafeiriou, 2016;Robertson, 2006) making assessment, diagnosis and selecting effective interventions a complex, lengthy and costly process. Furthermore, the chronic nature of these conditions represents a significant health and cost burden for families and society, being associated with parental illheath, time off work and loss of earnings. For example, Hurley-Hanson and colleagues (2019) report the projected annual costs ($461 billion by 2025) for autism in the USA and note that these costs are not unique to the USA but rather include medical, support services, housing, transport, education, as well as employment difficulties, and psychological and emotional effects on families. Estimates for ASD in the UK are £32bn a year, of which 56% is accounted for by service use (Buescher, Cidav, Knapp, & Mandell, 2014). Similarly, the total costs for ADHD have been documented as $143-266 billion annually in the USA with the largest cost burden being loss of income and productivity for adults as well as health care and education (Doshi et al., 2012). Given the increasing administrative prevalence of NDD (Collishaw, 2015) and recognition of NDD as a lifespan condition, there is growing demand for more effective and efficient health services.
Patients referred for a NDD assessment often experience substantial delays in receiving a diagnosis, for example, a recent randomised controlled trial (RCT) showed that 40% of families referred for an ADHD assessment were still awaiting a diagnosis six-months after initial assessment (Hollis et al., 2018a). Once diagnosed, families report significant delays in treatment initiation and unsatisfactory levels of treatment monitoring . Only one in five young people with Tourette Syndrome (TS) are able to access behavioural therapy for tics and those who do receive therapy typically receive less than half the recommended number of sessions (Cuenca et al., 2015;Verdellen, Keijsers, Cath, & Hoogduin, 2004). As a result of these time delays and limited access to treatment, children may not be fully benefiting from the intervention, which may have a negative consequence on their continuing social, societal and academic performance. Reasons for poor service delivery for NDD include lack of access to trained therapists , particularly in geographically remote regions, and insufficient clinical time to deliver best care practices .
It is clear there is a need to identify cost-effective easy-to-access strategies for assessing, treating and monitoring NDD. Technology offers the opportunity to deliver automated and self-directed interventions, improve access to therapy, allow therapies to be delivered over a distance (Hall et al., 2019), ease the patient pressure of face-to-face consultations , and improve clinical effectiveness and personalise treatment approaches . Not surprisingly, the rate of technological interventions aimed at mental health is growing exponentially, for example, in 2017 there were in excess of 10,000 applications (apps) relating to mental health, with the number increasing daily (Torous & Roberts, 2017). Given the claims of significant improvement to health care services and patient experience it is important that these technologies are independently and empirically evaluated. In the UK, the National Health Service (NHS) Five Year Forward View and Personalised Health and Care 2020 (Department of Health, 2014) describes the transformative potential of technology to drive efficiencies, improve outcomes and widen access in healthcare delivery in the NHS. The term 'technology' encompasses a broad range of devices, modalities and techniques. For the purpose of this review, we used a modified version of the National Institute for Health Research (NIHR) (www.nihr.ac.uk) definition of technologies to include: virtual reality assessment/therapy, digital technologies, telehealth, computerbased assessment/therapies, real-time monitoring and wearable devices, smartphone apps and sensors that can improve patient outcome and heath service efficiencies. We excluded technology relating to neuroimaging, biomarker tests/devices, neuro-stimulation/modulation/feedback due to the area being well covered by other recent reviews (Finisguerra, Borgatti, & Urgesi, 2019;Mcvoy et al., 2019;Van Doren et al., 2019). In addition, administrative technologies that patients/families with NDD cannot access or interact with directly, such as electronic health records (EHRs) were excluded. Given the wealth of data in telehealth this was analyzed as a separate review (see Methods section).
Recent systematic reviews have summarised the available evidence base for specific technologies or disorders, such as the use of continuous performances tests in ADHD , technologies used to facilitate the self-management of ADHD (Powell, Parker, & Harpin, 2018), telepsychiatry with children and adolescents (e.g. American Academy of Child and Adolescent Psychiatry [AACAP], 2017), and serious games for people with intellectual disability (Terras, Boyle, Ramsay, & Jarrett, 2018). By far the largest body of evidence is in the field of autism research, where multiple technologies, such as apps, telehealth, robots, and video modelling, have been implemented (Aresti-Bartolome & Garcia-Zapirain, 2014;Qi, Barton, Collier, & Lin, 2017).
Other reviews have focussed more broadly across technologies and disorders, such as Hollis et al. (2017) who looked at the effectiveness of digital health interventions across all mental health conditions in children and young people and Free et al. (2013) who summarised the evidence for mobile technology use within health services. Collectively, reviews investigating effectiveness have found that inherent methodological weaknesses such as poor-quality study designs, lack of control groups, and small sample sizes preclude any definitive conclusions on effectiveness.
Although effectiveness (as judged by a meta-analysis) is important in evaluating the validity of a technology, given the comparative lack of RCT evidence in relation to the number of technologies, it is important to include the wider literature, such as qualitative and case-study designs to fully understand the current evidence base (Murray et al., 2016). Furthermore, the majority of reviews have focussed solely on young people, given the chronic nature of NDD it is important that any evidence base includes available research on adult populations. The aim of this review is to highlight which technologies may be suitable for clinic adoption in NDD. To do this, we synthesise existing quantitative and qualitative research to provide a map of the current evidence of the use of technology within health services. Specifically, the review reports findings on the clinical/service effectiveness, economic impact and user impact (feasibility/ acceptability) for available technologies to aid assessment, diagnosis, monitoring and treatment of NDDs. We include studies from services users (children/families and adults) as well as healthcare professionals (HCPs). We evaluate the quality of the evidence and highlight gaps in the current literature for the technologies.

Methods
The review protocol can be accessed on PROSPERO (CRD42018091156). The search was undertaken in accordance with the recommended principles in the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (Moher, Liberati, Tetzlaff, Altman, & Group, 2009).

Eligibility criteria and study selection
The search strategy was developed following a scoping review of the current literature on technologies, digital health interventions and NDDs with input from an information specialist (EY). A combination of free text terms combined using Boolean logic (AND/OR) and controlled vocabulary headings (customised for each database) were used. The search parameters are outlined in Table 1 and a sample search for one database (MEDLINE) is presented in the online supplementary files (Supplematary Material 1).
The searches were initially conducted in March/April 2018 and updated in August 2019. Citations were downloaded and duplicates were removed using Endnote software. Additional studies were identified by reference mining eligible studies. The search revealed a vast (n = 19,448) number of papers, which would have been too complex to meaningfully synthesise. As a result, we deviated from the protocol in that the search was limited to peer-reviewed academic papers published within the last 5 years (January 2014-August 2019); authors were not contacted for any unpublished data. Given the speed at which technology develops, it is arguably more meaningful to current clinical practice to have a more restrictive date of publication. Papers were also restricted to English language due to time constraints and lack of funds for translation.

Data extraction and mapping
Titles and abstracts were reviewed for relevance by two researchers (AZV/CLH). Full-text articles were obtained and reviewed by two out of three reviewers (AZV/CLH/BJB) according to the inclusion/exclusion criteria ( Table 1). Papers that met the criteria for the review were subject to data extraction by two researchers (AZV/CLH), using a standardised data extraction tool developed for this study. The National Institute for Health and Care Excellence (NICE) definitions were used to clarify terms (see https://www.nice.org.uk/glossary) where appropriate. Clinical effectiveness was defined by whether or not the results of the study showed that a test or treatment resulted in symptom improvement. The economic evaluation was split into two areas which considered: 1) service delivery efficiencies (any savings to the service e.g. reduction in clinical time, and 2) economic impact (whether any formal cost-analysis has been completed). The user impact was any impact reported by patients, families or healthcare professionals, relating to the feasibility and acceptability of the intervention. All responses were categorised into YES/NO (was/was not effective/efficient/acceptable), UNCLEAR (e.g. some outcomes were beneficial, some were not, or increased costs were seen in one area but saved in another), NEUTRAL (did not positively or negatively impact on the cost/cost savings or effectiveness), NR (not reported). The papers were reviewed for whether the authors made an evaluation as to whether or not the technology was suitable for clinical adoption. This was categorised as NEEDS MORE RESEARCH, YES/NO (is/is not suitable for adoption presently), UNCLEAR or not reported (NR). Any disagreements between reviewers were resolved through discussion until mutual consensus was reached.

Quality assessment
The quality assessment tool proposed in our PROSPERO review protocol proved to be unsuitable because many studies were qualitative or mixed methods, thus requiring a broader assessment of quality than the one initially proposed. We therefore used the Oxford Centre for Evidence Based Medicine (OCEBM) Levels of Evidence to critically appraise the papers (OCEBM, 2011). The OCEBM approach is designed to allow clinicians to appraise evidence for questions that are clinically important. The scores range from 1 to 5 (1 = highest, 5 = lowest) with meta-analysis and RCTs typically being ranked high and qualitative papers/judgements being ranked low. Grades of recommendation are then applied where A is consistent with level 1 studies, B is consistent with level 2 or 3 studies or extrapolations from level 1 studies, C with level 4 or extrapolations from level 2 or 3 studies, and D with level 5 evidence or troublingly inconsistent or inconclusive studies of any level (OCEBM, 2011). Unlike some tools which are specifically designed to evaluate only the quality of a given methodology (i.e. qualitative papers or RCTs), the OCEBM allows the comparison of all papers using any methodology. Two authors independently assessed the levels of evidence (CLH, BJB). Disagreements were resolved through discussion.

Results
Titles and abstracts of 7982 papers were reviewed and 808 citations were considered to be potentially using technology in a clinical capacity. Full texts of the 808 identified articles were reviewed, and 88 were selected based on the aforementioned inclusion and exclusion criteria. On first review it was evident that a substantial number of papers were reporting on telehealth, thus in order to provide a comprehensive synthesis of the large number of papers identified, the findings were split into those using telehealth technology (n = 41/88) and all other technologies (n = 47/88). The latter 47 papers are included in this review and the telehealth papers are published in a separate article. The identification and selection process is outlined in a flow diagram (Fig. 1).

Study characteristics
3.1.1. Study design and quality appraisal The majority of studies did not use an RCT design, with only 13/47 Note: * Due to the large volume of papers identified and the rate of technology advances, the search was further restricted by date and only included published academic papers.
utilising this approach. Out of these, six used a waitlist control (i.e. no intervention) as the comparator group. The other studies used an active control which was typically "treatment as usual" (TAU), although the definition of TAU was usually not clearly defined. Two studies used a cross-over design between TAU and the intervention. Although wait-list controls are often a more affordable study-design, it is important to note that this control group may over inflate the treatment effect size of the active intervention compared to an active control and this should be considered when interpreting the results. Other study designs included a mixture of case reports and series, non-randomised designs, surveys and qualitative information. As a result of this, the most common occurring quality judgement was a rating of 4 or 5 which would rank the overall level of evidence of grade C or D. Table 2 presents the study designs and quality judgements.
The main quality limitation was low sample sizes. Sample sizes ranged from 1 to over 2500 participants, but the majority of studies (29/47) included fewer than 25 participants and thus were not sufficiently powered to address the primary study question. Not surprisingly, the smallest samples came from case studies and the larger sample sizes typically from the RCTs.

Patient characteristics and region
The majority (39/47) of papers reported data from a male or predominantly male sample. The majority (39/47) of participants were recruited from healthcare settings (hospitals, clinics, community healthcare), the remaining studies involved health clinics run within university settings (n = 7) and studies in residential units (n = 1). Most studies (39/47) reported data from child patients and/or their parents/ carers, of which the majority (34/39) of children were 13-years-old or A.Z. Valentine, et al. Clinical Psychology Review 80 (2020) 101870 younger. Only five studies considered adult service users and/or healthcare professionals and one study included sibling participants.

Diagnoses and comorbidities
The greatest number of studies were conducted on ASD, studies on communication disorders were the least represented (see Table 3). Twenty three papers did not report co-morbidities. The remaining papers reported a wide range of additional diagnoses, typically another co-existing NDD (n = 15), ODD/conduct disorder (n = 7) and anxiety (n = 5).

Purpose and type of technology
Papers were classified with regards to the clinical use of the technology, that is, whether the primary focus of the technology was to screen for or identify a condition (diagnosis/screening) or to observe the progress (monitoring) or treat an already diagnosed NDD (treatment). There were substantially more papers assessing the clinical use of technology for monitoring/treating NDD (41/47) than diagnosis/ screening (5/47), with one paper considering all areas (Hall et al., 2017). As shown in Table 4, seven different technology areas were found, the most popular of which was tablets/mobile apps and gaming.
Just over half of the papers that reported efficacy (26/41) were judged to be clinically effective. However, very few papers assessed the other outcomes of interest. From the papers that did consider these areas, six studies produced clear service delivery efficiencies (6/7) and three had a positive economic impact (3/4). The user impact was slightly better represented with 16 papers stating that users were positive towards the technology (16/23). Where authors noted whether the technology was suitable for clinical adoption, most studies (38/46) concluded that the technology required further research, with four stating the technology was suitable for adoption or had been already implemented, three stating it was not, and the author judgement was unclear in one study. The following presents the findings for each condition along with the judgements on these five factors (clinical effectiveness, service delivery efficiencies, economic impact, user impact, suitability for clinical adoption).

Autism Spectrum Disorder (ASD): Assessment/Diagnosis
Two papers looked at the use of a tablet to aid in assessment screening (Table 5). Campbell et al. (2017) found an improvement in accuracy of health records and more appropriate treatment action as a result of the intervention, clinicians valued the digital checklist tool to improve their clinical assessment. Brooks et al. (2016) found an increase in the number of cases able to be screened for ASD with no difference in numbers screened positive, the authors also assessed feasibility by comparing screening rates via web and paper methods and found that web-based screening was an efficient and feasible way to screen more young children. Campbell et al. (2017) and Brooks et al. (2016) reported that the systems using the tablets to administer the M-CHAT were retained after the study. Both papers documented service delivery efficiencies or economic impact. Brooks, Haynes, Smith, McFadden, and Robins (2016) noted that the netbooks used in the study to deliver web based screening were fairly low cost (< $200) and significantly reduced incomplete follow up screening (p < .001), as the follow up items were immediately automatically triggered when initial screening highlighted the need for follow up. Similarly, Campbell et al. (2017) found electronic screening led to a decrease in the number of positive screens, reducing the number of follow up physician visits.

Autism Spectrum Disorder (ASD): Monitoring
No papers reported on the use of digital technology to monitor people with ASD.

Autism Spectrum Disorder (ASD): Treatment
As the large majority of papers obtained in this review are concerned with the treatment of ASD, this section is further subdivided into the technology types. Although it is appreciated that in some studies there is a degree of overlap of these technologies, given the vast number of studies in this section it was deemed appropriate to sub-divide the

findings.
Where technologies could fit into more than one technology subgroup (for example gaming on a tablet) the primary technology, as given by the authors of each paper, was used.
Clinical effectiveness: Five papers reported gaming to be clinically effective and one reported a lack of clinical effectiveness. Bono et al. (2016) developed a closed-loop computer gaming system that allows interaction between therapists/caregivers and participants with ASD and found the gaming platform "GOLIAH" encouraged child-therapist interactions and allowed therapists to alter their treatment as needed, although no statistical data is presented. In contrast, in a non-randomised feasibility study using GOLIAH compared to treatment as usual, Jouen et al. (2017) found no significant group effect for any measures (p > .05). Caro et al. (2017) found an exercise game supported eye-body coordination in children with severe ASD. Two papers used a gaming intervention to support therapy sessions and found an improvement in target behaviours relating to social interactions such as smiling, visual contact, sharing of emotions etc. (Malinverni et al., 2017) and responding to a greeting (Uzuegbunam et al., 2018). Serret et al. (2014) found that a game which also incorporated aspects of virtual reality was effective in teaching emotional awareness.
User impact, feasibility and acceptability: Three papers reported that users' perceived good acceptability (Bono et al., 2016;Caro et al., 2017;Malinverni et al., 2017). Jouen et al. (2017) noted that although no-one dropped out the treatment, fewer sessions were completed than was anticipated (< 40% completion), questioning the acceptability of the 'Goliah' intervention. The remaining papers did not report acceptability or user impact (Serret et al., 2014;Uzuegbunam et al., 2018).
Service delivery efficiencies and economic impact: No papers reported on service efficiencies or economic impact.
Readiness for clinic adoption: Despite the overall trend towards finding gaming to be effective at improving symptoms, all papers reported that further research was required before the technology was suitable for clinical adoption. In general, gaming was positively reported to be acceptable, but further research on service efficiencies/ economic benefit is required.

Tablet / Mobile phone applications (apps). Clinical effectiveness:
As seen in Table 7, nine papers focussed on treatment in an ASD sample. Five of these papers reported the tablet/mobile app to be clinically effective/partially effective ( (Agius & Vance, 2016;Brodhead et al., 2018;Law et al., 2018;Lee et al., 2015). Tablets/ mobile apps were found to be effective in teaching requesting skills (Agius & Vance, 2016), increase varied play through an activity schedule (Brodhead et al., 2018), and help parents improve communication with their child with ASD (Law et al., 2018). Lee et al. (2015) compared therapist implemented and tablet assisted interventions and found the tablet resulted in less challenging and more on task behaviours for one participant, but there was no difference in the second participant and indicated the individual preferences for interfaces may play a moderating role. Two further papers utilised larger sample sizes but were still relatively small RCTs. Novack et al. (2019) found 15 children that used the mobile app (Camp Discovery) based on applied behaviour analysis (ABA) significantly improved their receptive language skills compared to the 13 children in the control group. Parsons, Cordier, Lee, Falkmer, and Vaz (2019) conducted an exploratory RCT of the TOBY app and found only limited effectiveness with some skill acquisition when data was pooled, but high dropout rates. Two papers reported the tablet was not clinically effective. Fletcher-Watson, Pain, Hammond, Humphry, and McConachie (2016) conducted a RCT on 54 children and found no difference in social communication skills after engaging with the tablet intervention. Jeffries, Crosland, and Miltenberger (2016) found the tablet was not effective as a method to deliver treatment to increase eye contact in a sample of three children.
User impact, feasibility and acceptability: Four papers documented user impact, however, most papers did not provide or provided only limited user feedback. Where feasibility and acceptability were assessed these were generally positive (4/4). Agius and Vance (2016) and Law et al. (2018) reported that parents found the tablet as being very acceptable to use, Fletcher-Watson et al. (2016) and Law et al. (2018) noted that parents rated the intervention as being highly acceptable (see Table 6). In one study the user acceptability was unclearin a continuation of the previously mentioned TOBY app study , Parsons, Wilson, Vaz, Lee, and Cordier (2019) detail the experiences of 24 parents who used the app. The core theme they highlighted was that "The TOBY App is Not a Panacea", that it was only one part of the jigsaw of treatment and needed to be used as a complementary therapy in conjunction with support from a therapist (p. 4058). They also noted that the individuality of families was not accommodated in the app, with families feeling there was a lack of control and choice.
Service delivery efficiencies and economic impact: Three papers documented service delivery efficiencies or economic impact. Lee et al. (2015) reported that sessions were shorter with the iPad. Law et al. (2018) suggest that mobile apps could reduce waiting times for interventions, and Parsons, Wilson, et al. (2019) reported that some parents felt the app could reduce the number of face-to-face sessions required, thus decreasing the distance travelled. These findings all allude to service delivery efficiencies, but no further service delivery efficiencies or costs were reported in any of these papers. Notes: NR not reported; Population: ASU = adult service users, CYP = children and young people, HCP = healthcare professionals, P/C = parents/ caregivers; Gender: Mixed m = mixed predominantly male, Mixed F = mixed predominantly female.   Readiness for clinic adoption: Two papers concluded that the technology they reported on, "Look in My Eyes Steam Train" (Jeffries et al., 2016) and an un-named iPad app (Fletcher-Watson et al., 2016) were not suitable for clinical adoption. The remaining papers concluded more research was required.
To summarise, there are mixed findings on the effectiveness of interventions for children delivered via tablets. However, the available evidence suggests this technology is considered user-friendly and thus is worthy of further pursuit.
3.3.3.3. Video/DVD/video-modelling. Video modelling is a visual teaching method in which individuals watch a video of a person modelling a target behaviour or skill, with the aim that the individual watching will then imitate the target behaviour. The six papers that reported on the use of video/DVD/video-modelling all used it for treatment (see Table 8).
Clinical effectiveness: Most papers (5/6) reported on clinical efficacy, however, sample sizes were small with four papers reporting three or fewer participants (Kern Koegel, Ashbaugh, Navab, & Koegel, 2016;Kourassanis, Jones, & Fienup, 2014;Radley et al., 2015;Stewart & Umeda, 2014). Three of these papers reported positive clinical effectiveness and video-modelling was found to increase empathic communication (Kern Koegel et al., 2016), improve social game behaviours (Kourassanis et al., 2014) and social skills accuracy (Radley et al., 2015). The findings of Stewart and Umeda (2014) were more mixed, reporting that it was effective in teaching motor imitation only in some children. In a larger study with 38 participants, Dai et al. (2018) used a DVD to deliver an ASD parenting intervention reporting mixed results, with parents' confidence about their parenting abilities significantly increasing, knowledge increasing slightly, but self-efficacy remaining constant.
User impact, feasibility and acceptability, service delivery efficiencies and economic impact: In the largest study involving 67 families, Bagaiolo et al. (2017) found good compliance rates in an RCT of video-modelling for parent training. The authors concluded that further research on effectiveness was needed but the intervention was a "feasible and lowcost way to deliver care" (p. 603), however, they did not present empirical data to support this. Dai et al. (2018) reported that parents rated the DVD parenting intervention highly, with it being acceptable, clear, and effective. No reports were made with regards to technology problems and the DVD format was seen as accessible to all parents.
Other than as reported above, none of these studies reported on service delivery efficiencies, or economic or user impact.
In summary, the majority of research in video-modelling has looked at improving social communication in ASD with some evidence of clinical effectiveness in very small samples. There is a potential this approach may result in service delivery efficiencies but further research specifically relating to cost-effectiveness is required and the technology may need further refinement. Table 9, of the six papers that used robots, all were aimed at improving treatment delivery for children and young people with ASD, all 13 years of age or under, and in one case also involving siblings (Huskens, Palmen, Van der Werff, Lourens, & . Sample sizes were small in all robot studies (5-15 participants).

Robots. As evident in
Clinical effectiveness: All robot papers reported on clinical effectiveness. Four papers used the NAO robot, a small humanoid robot, which is designed to interact with people by walking, dancing, speaking, and recognising people and objects. Barakova, Bajracharya, Willemsen, Lourens, and Huskens (2015) used NAO robot to assist a trained therapist in a Lego therapy session which also involved other children. They found an increase in social interactions and engagement throughout the sessions. Although these are promising results the authors noted the design needed improving to allow the robot to cope with a wider range of scenarios that may present in a therapy situation.
In an RCT using the NAO robot, to assist in the delivery of cognitive behavioural group therapy based on Rational Emotive Behaviour Therapy (REBT) principles, Marino et al. (2019) found substantial improvements in emotion recognition, comprehension and emotional perspective-taking in children with ASD. In contrast, David, Costescu, Matu, Szentagotai, and Dobrean (2019) found that the NAO robot and standard human therapy produced similar improvements in turn-taking and Huskens et al. (2015) did not find any improvement in collaborative behaviours following Lego-based therapy with NAO, and thus do not support the clinical effectiveness of robots. Kwon, Lee, Mun, and Jung (2015) tested a cat robot to assist trained language and play therapists and found an improvement in positive meaningful social interactions throughout the sessions. However, there were lower rates of interest as the sessions continued. Yun, Choi, Park, Bong, and Yoo (2017) compared behavioural therapy delivered by a robot with that of a human assistant and found both groups resulted in significant improvement in positive interactions, but with no significant differences between robot and human assisted groups. They concluded that further research was needed.
User impact, feasibility and acceptability: The user acceptability/feasibility was not formally assessed in any paper, however, Barakova et al. (2015) found a mixed response in the children's responses to the robot, with some children reporting during the sessions that they liked the robots and others reporting they hated it. Yun et al. (2017) reported children were very willing to engage with the CARO robot and Marino et al. (2019) commented that they had no dropouts, children showed high interest and sustained attention throughout the intervention. David et al. (2019) argue that because children looked more at the robot than the human equivalent they therefore were more interested in the robot therapy.
Service delivery efficiencies and economic impact: No papers reported information on economic impact, service delivery efficiencies.
Readiness for clinic adoption: To summarise, there is some evidence to show clinical effectiveness of incorporating robots into therapeutic sessions for children with ASD, however, this requires substantial further research.
3.3.3.5. Virtual reality. Virtual Reality (VR) simulates real world scenarios using computer graphics and usually a head mounted display, headphones and hand controllers. As is clear in Table 10, three papers report on virtual reality (VR) all involving a child and young person sample, with one paper (Mraz et al., 2016) also including adult service users.
Clinical effectiveness: All papers document some degree of effectiveness.  reports the case study of a female with Rett Syndrome and found the VR therapy decreased the number of stereotypies and improved functional movements.  reports six female case studies who also used VR to improve upper extremity movement, which was successful for those who engaged. The final case study (De Luca et al., 2019) found one month of combined cognitive behavioural therapy and VR cognitive treatment resulted in improvements in attention and spatial cognition skills, as well as a significant reduction of stereotypies in a 16-year-old male with ASD.
User impact, feasibility and acceptability:  noted technical glitches and the need to identify games to match the individual's preference to improve engagement. De Luca et al.
(2019) noted that the patient was motivated and engaged in the VR treatment. No other studies reported user impact although high rates of completion may indicate acceptability.
Service delivery efficiencies and economic impact: No papers report on service efficiencies or economic impact.
Readiness for clinic adoption: In summary, there is some provisional, limited evidence that VR may be effective for treating NDD, however, all papers on VR reported further research was required.

Attention Deficit Hyperactivity Disorder (ADHD): Assessment/ Diagnosis
As seen in Table 11, eight papers reported on ADHD, three used the QbTest (a computer test of attention and impulsivity with a measure of activity) in assessment/diagnosis Hall et al., 2017;Hollis et al., 2017), one used an app to monitor ADHD (Weisman et al., 2018), and four involved treatment utilised gaming (Bul et al., 2016;van der Oord, Ponsioen, Geurts, Ten Brink, & Prins, 2014), virtual reality (Bioulac et al., 2018) and video feedback (Wilkes-Gillan et al., 2017).
Three papers reported the use of the QbTest to assess and monitor ADHD, all studies involved children/young people and their parent/ carers (see Table 11). QbTest is a computer test which provides an objective report of attention, impulsivity, and activity, which can be added to the results of clinical assessment and rating scales in the ADHD assessment process.
Clinical effectiveness: Hollis et al. (2018) found a 26% increase in diagnoses made within 6 months in families who received the QbTest compared to when the QbTest results were withheld, clinicians' confidence in their diagnoses was also increased with the QbTest.
User impact, feasibility and acceptability: Hall et al. (2017) found that the QbTest was generally considered to be acceptable and feasible to Notes: NR not reported; Population: ASU = adult service users, CYP = children and young people, HCP = healthcare professionals, P/C = parents/ caregivers; Gender: Mixed m = mixed predominantly male, Mixed F = mixed predominantly female.   A.Z. Valentine, et al. Clinical Psychology Review 80 (2020) 101870 implement as part of both assessment and medication practice in ADHD. Service delivery efficiencies and economic impact: Hall, Selby, et al. (2016) found implementing QbTest in routine assessment reduced the number of appointments to confirm an ADHD assessment and resulted in cost-savings. This was supported by a later RCT conducted by Hollis et al. (2018b) who found that the QbTest resulted in service efficiencies including reduced time to make a diagnosis and did not affect diagnostic accuracy. This was one of the few studies to conduct an economic evaluation and report empirical data indicating the intervention to be cost-neutral and ready to implement in clinical practice.
Readiness for clinic adoption: The evidence shows that QbTest may be promising for improving the assessment of ADHD in children, further research to ascertain clinical and cost-effectiveness for medication management and also for adults is needed.

Attention Deficit Hyperactivity Disorder (ADHD): Monitoring
One paper (Weisman et al., 2018) looked at the use of a mobile app to aid monitoring ADHD medication; this was found to be effective at improving adherence, but did not affect the severity of ADHD symptoms. However, the study was sponsored by the same pharmaceutical company that developed the app and had a relatively high rate dropout rate (61.5% completion at third visit). The service efficiencies, economic impact, and user impact were not reported and the authors suggested further research was necessary before the technology was suitable for clinic adoption.

Attention Deficit Hyperactivity Disorder (ADHD): Treatment
Clinical effectiveness: Three studies conducted in ADHD treatment showed promising clinical effectiveness. Bul et al. (2016) used a serious game delivered at home and found it improved life skills. van der Oord et al. (2014) reported gaming to be successful in improving executive functioning in children with ADHD when compared to a wait-list control. One paper reported on video-modelling for ADHD (Wilkes-Gillan et al., 2017) and found increased social skills playing for some children, but concluded that the technology may need to be adapted to different developmental stages and was restricted in children with high levels of Oppositional Defiant Disorder (ODD).
In the largest VR study (n = 59), Bioulac et al. (2018) compared a virtual classroom, with methylphenidate or psychotherapy (active control) treatment. The cognitive training delivered through VR was as effective as methylphenidate for reducing distractibility and impulsivity in children with ADHD. However, even though larger than other VR ASD studies, the sample size was still relatively low and subgroup analysis was not possible.
User impact, feasibility and acceptability: Two papers reported good acceptability (Bul et al., 2016;Wilkes-Gillan et al., 2017). The remaining papers did not report acceptability or user impact.
Service delivery efficiencies and economic impact: No papers reported on service delivery efficiencies or economic impact.
Readiness for clinic adoption: All studies reported that further research was necessary before implementing technology in clinics. Further development of the technology was recommended by Wilkes-Gillan et al., 2017). In summary, preliminary research in ADHD needs expanding on, possibly incorporating video-modelling with peers in school environments and using larger sample sizes.

Learning Disabilities & Specific Learning Disorders (LD and SLD): Treatment
Clinical effectiveness: Within learning disabilities and specific learning disabilities, no papers reported on the use of digital technology to assess, diagnose, or monitor, five papers addressed treatment (see Table 12). From these five studies, two studies used gaming as Table 12 Summary of ID/LD/SLD studies. treatment for people with intellectual disabilities or developmental delay. One study, used "Hot Plus" a commercial video game from Taiwan (Hsieh, Lee, & Lin, 2016) to deliver treatment to children with developmental delay. Although they found the physical health of children improved, no difference in functional performance or family impact (including measures of family function, satisfaction with child's care, and quality of life measures) were noted, indicating mixed findings on clinical effectiveness. In the only gaming study on adults, Garcia-Villamisar, Dattilo, and Muela (2017) looked at the impact of games to improve social skills in adults with ASD and Intellectual Disability (ID) and found a positive impact on symptoms. Aspects of gaming, were also reported by Hallas and Cleaves (2016) who documented the use of technology within a learning disabilities service. The only aspect of the paper which met inclusion criteria for this review was a case-study of an adult with ID and autism who was able to explore relationships more effectively by incorporating gesture-based technology as part of his therapy sessions. Pedroli et al. (2017) reported VR training improved attention in 10 children with dyslexia and found all participants showed improvement in attention but no immediate improvement on reading. The final exploratory study used a magic room (a sensory smart place integrating various technologies) for children with ASD/ID and found caregivers reported cognitive improvements, although the study was limited by lack of control group and small sample (Garzotto & Gelsomini, 2018). User impact, feasibility and acceptability: Although the magic room was generally assessed in positive terms by the parents/carers, some negative behaviours were seen during therapy. For example, some children had difficulty in self-regulation and control of emotions resulting in over-excitedness leading to pushing other children, kicking and aggressive behaviour (Garzotto & Gelsomini, 2018). Hallas and Cleaves (2016) document that in general the technology was seen favourably by patients and HCPs and with regards to the case study, the patient's community nurse reported that the technology helped the patient to engage. No other studies reported on user impact.
Service delivery efficiencies and economic impact: No papers reported on service delivery efficiencies and only Garzotto and Gelsomini (2018) reported briefly on economic impact, in that they documented the cost of implementing the magic room was cheaper than a standard multisensory room typically used, although these authors broadly report costs, a formal economic evaluation was not conducted.
Readiness for clinic adoption: All technology was reported as requiring further research prior to clinic adoption.

Communication disorders: Assessment/diagnosis
Within communication disorders, one paper focussed on assessment (see Table 13). Mendes, Dacakis, Block, and Erickson (2015) looked at the use of an Ambulatory Phonation Monitor (APM) to assess social participation in three male adults who stutter. They found adults tolerated wearing the device but it was reported as cumbersome so requires further modification. The authors reported the cost (AUD $10,000) of the equipment but did not report on any cost-savings. Further research is therefore necessary before clinic adoption.

Communication disorders: Monitoring
No papers reported on the use of digital technology to monitor people with communication disorders. Lorusso, Biffi, Molteni, and Reni (2018) reported on the use of a tablet in children with communication difficulties to aid speech therapy, they reported high levels of user satisfaction, but further research is necessary prior to clinic adoption (see Table 13).

Discussion
The purpose of this systematic review was to examine the evidence base for the clinical use of technology within the NDD field, specifically looking at the assessment/diagnosis and monitoring/treatment. Of the 47 studies included in this review, a range of technologies were identified: gaming, mobile/tablet apps, video/DVD, robots, virtual reality, and QbTest. Similarly, a broad spectrum of NDDs were identified in the review. By far the greatest number of studies were conducted on ASD. Perhaps this is not surprising given that people with ASD may particularly benefit from technology to overcome communication difficulties as it is less socially threatening than face-to-face therapy (Goodwin, 2008). This also represents the more prevalent NDDs (ASD, ADHD, and ID/LD). The studies were largely heterogeneous in terms of range of technologies and NDD; even within similar papers (same technology/ NDD) the intervention, study design, follow-up periods and outcome assessments were extremely varied, making it difficult to compare results.

Strength of evidence for each outcome
The outcomes of interest in this review were: clinical effectiveness, user impact, service delivery efficiencies and economic impact, and readiness for clinic adoption.

Clinical effectiveness
The interventions in 26/47 papers presented some clinically effective findings. These included improvement/reduction of: symptoms (e.g. hyperactivity), parental distress and depression, social skills, as well as general effectiveness of treatments and diagnostic accuracy. Statistical comparisons were not appropriate given the wide ranging nature of outcomes both across and within conditions. Overall, 55.3% of papers reported the clinical effectiveness had positive results. However, these findings must be interpreted with caution as many of Notes: NR not reported; Population: ASU = adult service users, CYP = children and young people, HCP = healthcare professionals, P/C = parents/ caregivers; Gender: Mixed m = mixed predominantly male, Mixed F = mixed predominantly female.
these were small studies and/or pilot studies. Further research is necessary utilising larger sample sizes. Research is also necessary exploring how the results of the study generalise to everyday life with few studies including follow-up data.

User impact, feasibility and acceptability
A large body of papers identified in this review, used technology with parents/caregivers and/or with children and young people. Adult service users were not so well represented reflecting a more limited evidence base that technology can be used with adults and also a tendency for research to focus on children and young people in NDD. However, the feedback from parents implied that adults were generally willing to embrace technology. This is in support of other reviews which have shown that families and professionals feel that it is ethically acceptable to use robots in ASD therapy (Richardson et al., 2018). The technology was also used by people with a range of disabilities, which resulted in greater participation in healthcare, and allowed a better therapeutic relationship to develop. For example, by providing a more suitable and structured environment for children with learning disabilities.
Likewise, the responses from studies which looked at HCPs views of technology were largely positive, however, again this represents a small subsection of HCPs who are willing to be involved in technology research. HCPs perspectives and the lack of robust evidence base are potentially one of the largest barriers to the wider acceptance of technology. This review has shown that although there is some progress in technology being used clinically, there is a barrier between the use of technology in research and its use clinically (evidenced by the large volume of papers excluded due to not being used clinically or technology in development). The difficulties in publishing papers is often that the technology is an area of rapid development and by the time of publication of the paper, the technology has often been developed or optimised. Practitioners are guided by evidence-based practice but in order to attain the evidence base, clinicians have to take a leap of faith and use the technology in its infancy. In some cases where this has been done, it has been integrated into services with relative ease, for example Hallas and Cleaves (2016) who introduced digital technologies across all professions and services in a UK Learning Disabilities Service. In other cases, in some areas, such as assessment of NDDs, we are aware that technology is being used in clinical practice, but in the present review, we have found limited published evidence about efficacy, costs, or user impact. There needs to be more pragmatic trials and/or service evaluations to evaluate the real-world impact of technology which is already being used clinically.
One overarching theme that was evident across technologies was the idea that the technology is best when it is matched to each individual's or family's needs. This has been highlighted in other reviews for example Richardson et al. (2018) noted that the diversity of children with ASD must be considered when looking at robot therapy as children face varied challenges and Aresti-Bartolome and Garcia-Zapirain (2014) highlight that personalised tools to meet individual's needs are essential to promote engagement with the technology.

Service delivery efficiencies and economic impact
Although many studies reported service delivery efficiencies, such as fewer sessions/reduced minutes in consultations (e.g. Hollis et al., 2018a), formal cost analysis within studies was very limited, perhaps reflecting research designs that are typically focussed around effectiveness. However, when looking at the suitability to adopt a technology in clinical practice, these questions are key and this area undoubtedly requires more research.

Readiness for clinic adoption
Although there has been a proliferation of studies in recent years, due to the mixed results, in many cases there is a need for further clinical research, using more rigorous study designs before implementation in practice. As evidenced by the number of papers identified, technology is undoubtedly a fast moving field and it is not always possible to conduct and publish rigorous research studies at a suitable pace. A review of conference publications may yield more promising results of upcoming technology, however, the evidence base for these is likely to be even more limited. Looking both across and within NDDs there are a wide range of children's needs and capabilities, which add to the uncertainty of findings. This review highlights that the majority of research has focused on technology use within ASD and ADHD, but further research would be useful to identify which patients are most likely to benefit from technology given the overarching theme relating to tailoring technology to individual's needs and personalities.
Although some studies were clinical trials, the majority of data presented was mainly of mid-to low-quality and the findings should be interpreted with caution. This was generally because of small sample sizes and more qualitative/reflexive study designs, however, RCTs are time consuming and do not always lend themselves to looking at realworld evaluation. Furthermore, nearly half of the RCTs identified used a wait-list as the control group. It is likely that any treatment effects are inflated when comparing to a non-active control group. To understand the specific effect of utilising technology for the mode of delivery for the intervention, RCTs should include an active control such as face-toface delivery or an active TAU. Our review also noted that the description of what constitutes TAU should be more clearly defined in papers. Our findings support previous research which has highlighted that the strength of the evidence is limited by methodological weaknesses including poor-quality study designs, small sample sizes, and lack of appropriate control groups.

Evaluation of the study
The findings must be taken into account within the limitations of the study. The review was much more wide-ranging than anticipated with many different technologies being identified in the search. However, the areas were in line with previous reviews in the field of autism e.g. Aresti-Bartolome and Garcia-Zapirain (2014) which found virtual reality, apps, telehealth and robots were the main technology categories. It appears that the research on video-modelling has decreased in recent years, perhaps because of the large body of previous research and demonstrated evidence-base (Qi et al., 2017). The search terms were developed with an information specialist (EY) and were deemed by others on the steering committee as comprehensive. Due to the number of papers obtained, limits had to be put on the types of papers included in this review. The review was therefore limited to studies in the English language, which could have resulted in papers being omitted, although we do not expect this would significantly affect the findings, it could limit generalizability. We also deviated from the protocol by removing the grey literature search due to the significant number of relevant papers. Despite this, the review was methodologically rigorous, data extraction was thorough and a methodical process was duly followed by the researchers to ensure the data mining was complete. The review was strengthened by the evaluation of the papers by at least two members (three if there were disagreement), to improve the integrity of the findings.

Future research
With regards to knowledge gained from the review, there are several areas for future research identified. In particular there is a need for further studies utilising RCTs with a health economic evaluation of the technology, as well as non-randomised designs aimed around process evaluation and qualitative feedback on the user acceptability and feasibility of the intervention. This review revealed this is currently missing from the majority of studies, but is vital to inform on the clinical utility of the technology.
The present review was limited in that studies were only included if participants had been recruited from a clinical setting. This meant that many studies were excluded because participants were recruited online or via a community service which was not healthcare related. In addition, a number of studies were delivered by healthcare professionals in education settings but these were also excluded from the present review, which will have reduced the number of identified papers on disabilities treated more commonly within school (e.g. speech disorders and learning difficulties such as dyslexia). A future review could explore these specific groups.
Similarly, a number of interesting studies presented data from technology piloted on typically developing children and as such were excluded from this review, however, further research could explore technology designed for people with NDD but currently in the developmental phases, which may aid the implementation of technology into health services.
It is also possible that technology is being used within health services but that this is not being formally evaluated and/or not being written for publication. Further research could survey healthcare services regarding their use of technology. For example, Hassan et al. (2018) surveyed movement disorder society members globally and found that half of the sample who returned questionnaires (n = 287/ 549) reported using telehealth. This is obviously a biased sample, as arguably those who responded to the survey were the practitioners with an interest in technology, however, further research in this area may be fruitful.

Conclusions
The current review aimed to present the evidence for the clinical use of technology in assessing and treating NDDs. Technologies were evaluated on three core aspects: clinical effectiveness, user acceptability and service delivery efficiencies. The review found mixed evidence for the clinical effectiveness, however, there is a lack of robust evidence to support user acceptability, and even fewer studies reporting on service delivery efficiencies. To ensure technologies are effectively implemented into clinical practice there is a need to invest in larger randomised controlled trial designs with longer follow-up periods, as well as studies investigating the economic and user impact exploring the use of technology in healthcare settings. To date, the area of focus in terms of technology has been mobile and tablet apps, and the area of focus in terms of disorder has been ASD. If the value of technologies are to be fully realised, further research needs to evaluate the potential effectiveness of a range of technology platforms across a range of NDDs. It is vital that clinicians, academics, patient and public involvement groups and industry work collaboratively to develop technology that addresses a clinical need that is relevant and acceptable to end users and considers each individual's needs.

Role of funding sources
This research was funded by the NIHR Collaboration for Leadership in Applied Health Research and Care East Midlands (CLAHRC-EM; RC4816). The funders had no role in the study design, collection, analysis or interpretation of the data, writing the manuscript, or the decision to submit the paper for publication.
Disclaimer: The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Contributors
CLH designed the study and wrote the protocol. EY and AZV conducted the literature searches. AZV, CLH, BJB reviewed the abstracts and titles for inclusion. AZV and CLH extracted the data and drafted the manuscript. CLH and BJB completed the risk of bias assessment. Final processing and summarising of the data was conducted by AZV. AZV wrote the first draft with support from CLH. CLH supervised the process with support from CH. MJG and CH provided feedback on the protocol and manuscript. All authors contributed to and have approved the final manuscript.

Declaration of Competing Interest
CLH and CH acknowledge support from the NIHR Health Technology Assessment (HTA) (Ref 16/19/02) and NIHR MindTech MedTech Co-operative. CH additionally acknowledges the support of the NIHR Nottingham Biomedical Research Centre. All other authors declare that they have no conflicts of interest.