Virtual Reality Aggression Prevention Therapy (VRAPT) versus Waiting List Control for Forensic Psychiatric Inpatients: A Multicenter Randomized Controlled Trial

Many forensic psychiatric inpatients have difficulties regulating aggressive behavior. Evidence of effective aggression treatments is limited. We designed and investigated the effectiveness of a transdiagnostic application of a virtual reality aggression prevention training (VRAPT). In this randomized controlled trial at four Dutch forensic psychiatric centers, 128 inpatients with aggressive behavior were randomly assigned to VRAPT (N = 64) or waiting list control group (N = 64). VRAPT consisted of 16 one-hour individual treatment sessions twice a week. Assessments were done at baseline, post-treatment and at 3-month follow-up. Primary outcome measures were aggressive behavior observed by staff and self-reported aggressive behavior. Analysis was by intention to treat. This trial was registered in the Dutch Trial Register (NTR, TC = 6340). Participants were included between 1 March 2017, and 31 December 2018. Compared to waiting list, VRAPT did not significantly decrease in self-reported or observed aggressive behavior (primary outcomes). Hostility, anger control, and non-planning impulsiveness improved significantly in the VRAPT group compared to the control group at post-treatment. Improvements were not maintained at 3-month follow-up. Results suggest that VRAPT does not decrease aggressive behavior in forensic inpatients. However, there are indications that VRAPT temporarily influences anger control skills, impulsivity and hostility.


Introduction
Aggression of forensic psychiatric inpatients is highly prevalent, with recent estimates between 31% and 59% of at least one violent assault during hospitalization [1,2]. Inpatient aggression threatens safety and wellbeing of both patients and staff [3,4]. Aggression regulation is a main treatment goal in forensic psychiatry, as it is a prerequisite for successful release and re-socialization. Effective aggression interventions are highly needed, but the body of research on psychosocial interventions in forensic populations is small.
A systematic review of psychological therapies designed for violent behavior in clinical and forensic settings found ten studies, providing tentative support for the effectiveness on aggressive behavior. However, the review included only two randomized controlled trials, both of cognitive behavioral therapy, one of which had negative outcomes [5]. Another more recent systematic review included 16 studies on the effect of Aggression Replacement Training (ART) on antisocial behavior in young people and adults [6]. Although results indicated positive effects of ART on recidivism rates and secondary outcomes (e.g., social skills), only four studies involved adult samples, and the majority of the included studies were of average quality because of non-randomized designs, selection bias, and small sample sizes.
The paucity of evidence on the effectiveness of aggression therapies, may be related to the characteristics of forensic populations. A meta-analysis on the efficacy of psychological treatments for violent offenders found only significant treatment effects on community recidivism, and not on institutional misconduct [7]. The authors offer several explanations for these findings, e.g., the smaller number of studies conducted in inpatient forensic mental settings. Moreover, aggression treatment is often hampered because patients have severe chronic psychiatric and aggression problems. Engaging forensic patients in treatment is challenging. Attrition is high, patients are often demoralized or unwilling to change their behavior, and have difficulties to apply therapeutic insights in their daily lives. Furthermore, previous research has shown that role-play contributed significantly to reductions in violent recidivism [7], but within forensic psychiatric settings, therapists often feel insecure provoking patients in a real-life setting.
Virtual Reality (VR) may solve several of the problems listed above. In a virtual environment, patients have the opportunity to explore their behavioral reactions to social situations and practice new behavior in a safe, controlled, realistic and personalized environment. Furthermore, VR interventions also focus on practicing behavior and not only on gaining cognitive therapeutic insights. Lastly, it is expected that VR is more enjoyable than current psychosocial therapies, which may result in better engagement in treatment, less drop-out and no-shows. We developed and tested a virtual reality aggression prevention therapy (VRAPT) targeting aggression in a sample of forensic psychiatric inpatients who stay in a highly secured setting [8].
The current study is a multicenter randomized controlled trial conducted in four forensic psychiatric centers. We investigated the effectiveness of VRAPT on aggressive behavior, by comparing VRAPT (in addition to treatment as usual) to waiting list control (treatment as usual). We hypothesized that VRAPT: 1. decreases both self-reported and staff-reported aggressive behavior; 2.
reduces determinants of aggressive behavior, including anger, impulsivity, and hostility.

Study Design and Participants
This study was a randomized controlled trial in four forensic psychiatric centers (FPCs) in The Netherlands. Patients residing in FPCs are admitted under the judicial measure 'TBS-order' (in Dutch: ter-beschikking-stelling: this translates as 'detained under hospital order'). 'Detained under hospital order' means that the court has established a relation between the offense committed and a psychiatric disorder. Details of the study protocol have been published elsewhere [8].
Inclusion criteria were being a forensic psychiatric inpatient, and being referred by their clinical team to the study based on pre-admission history of aggression and/or current clinical problems with reactive aggression. Exclusion criteria were (history of) epilepsy, insufficient mastery of the Dutch language, and an intelligence quotient (IQ) below 70 as estimated by an IQ test or their treating specialist.
This study has been approved by the medical ethical committee of the University Medical Centre Groningen, Groningen (number: NL52939.042.15) and was conducted according to the Declaration of Helsinki. The study was registered in the Dutch Trial Register (NTR, TC = 6340). Patients were informed about the study by a research assistant under the supervision of the head of treatment. Written informed consent was obtained, and patients were allowed to stop participating at any time without giving a reason, and without consequences for their further stay in the FPC. Participants from both groups received 35 euros for their participation on the assessments.

Randomization and Masking
Participants were randomly assigned to VRAPT or waiting list. An independent research coordinator of the University Medical Center Groningen performed the randomization by utilizing the program Research Randomizer. Randomization blocks of two (1:1) were generated for each participating FPC separately. Participants were informed of their allocation after the baseline measurement was completed. All assessments were done by research assistants. We aimed for assessors blinded for study condition, but this was not feasible, because all staff members and research assistants had access to clinical records, due to safety reasons and knowing the whereabouts of their patients. Furthermore, most participating patients were not allowed to leave the treatment ward without staff, so they were aware when a patient entered the VR-room. Because of this, only 30 measurements at post-measurements, and 28 measurements at follow-up were completely single-blind.

Procedures
Participants completed assessments at baseline (T1; pre-treatment), post-treatment (T2), and 3-month follow-up (T3). During the study period, staff was asked to complete an observation scale for each participant once weekly, to monitor aggressive behavior at the ward. Observation started three months before the intervention and lasted until end of follow-up. Participants who dropped out of the treatment were still included in the staff observation ratings. Treatment as usual consisted of standard treatments, including for instance: medication, psychomotor therapy, schema focused therapy, supportive counseling, skills training, or treatment for substance use disorders. Patients were allowed to participate as long as it did not directly target their aggression problems. Participants in the waiting list were offered VRAPT after they had completed follow-up measurements. The participants from the treatment group, completed a short interview with a research assistant at T3. See Figure 1 for the CONSORT flow diagram of VRAPT.

Intervention
VRAPT consisted of 16 individual biweekly sessions that lasted on average one hour. Sessions consisted of VR exercises to practice new and adequate behavior. As a theoretical framework for VRAPT the Social Information Processing (SIP) was used. The SIP model describes mechanisms how individuals interpret and respond to social situations [9], and how these can lead to aggressive behavior. The model includes six cognitive-emotional steps, which were targeted in VRAPT ( Figure  2). Except for the first introduction session, every session followed the same format: a short review of the previous session, clarification of a step of the SIP model, VR exercise, and discussing the VR exercise. Each session ended with an evaluation of the progress in learning goals [8]. Some examples of learning goals were: "I would like to learn how to react assertively, instead of aggressively. Therefore, I want to increase my skills, and to make sure I become less frustrated", "I would learn how to regulate my anger, and react in a more prosocial manner", "I want to discover which triggers and situations make me angry and react aggressively, so I can prevent future aggressive outbursts". The VRAPT exercises corresponded to the different steps of the model and became more difficult as the VRAPT progressed.
The first two steps of the model are referred to as early information processing, and they are about encoding and making attributions of internal and external social cues. VR sessions 1-5 consisted of exercises on facial emotion recognition, and recognizing aggressive behavior of other people. The next four steps of the SIP model are labeled late information processing, starting with step three: goal selection. This step involves deciding the desired outcome in a given social situation.
Steps four, five and six are respectively generating, evaluating and enacting responses. Recent evidence suggests that reactive aggression is associated with both early and late information processing, and deficits in information processing are related to aggressive behavior [10]. From session 6-8 VR exercises focused on de-escalating aggressive behavior of others (i.e., avatars) and on regulating physical arousal (i.e., heart rate and skin conductance). In the second half of the therapy (VR session 9-16) all SIP steps were integrated into challenging interactive virtual role-plays. The

Intervention
VRAPT consisted of 16 individual biweekly sessions that lasted on average one hour. Sessions consisted of VR exercises to practice new and adequate behavior. As a theoretical framework for VRAPT the Social Information Processing (SIP) was used. The SIP model describes mechanisms how individuals interpret and respond to social situations [9], and how these can lead to aggressive behavior. The model includes six cognitive-emotional steps, which were targeted in VRAPT ( Figure 2). Except for the first introduction session, every session followed the same format: a short review of the previous session, clarification of a step of the SIP model, VR exercise, and discussing the VR exercise. Each session ended with an evaluation of the progress in learning goals [8]. Some examples of learning goals were: "I would like to learn how to react assertively, instead of aggressively. Therefore, I want to increase my skills, and to make sure I become less frustrated", "I would learn how to regulate my anger, and react in a more prosocial manner", "I want to discover which triggers and situations make me angry and react aggressively, so I can prevent future aggressive outbursts". The VRAPT exercises corresponded to the different steps of the model and became more difficult as the VRAPT progressed.
The first two steps of the model are referred to as early information processing, and they are about encoding and making attributions of internal and external social cues. VR sessions 1-5 consisted of exercises on facial emotion recognition, and recognizing aggressive behavior of other people. The next four steps of the SIP model are labeled late information processing, starting with step three: goal selection. This step involves deciding the desired outcome in a given social situation. Steps four, five and six are respectively generating, evaluating and enacting responses. Recent evidence suggests that reactive aggression is associated with both early and late information processing, and deficits in information processing are related to aggressive behavior [10]. From session 6-8 VR exercises focused on de-escalating aggressive behavior of others (i.e., avatars) and on regulating physical arousal (i.e., heart rate and skin conductance). In the second half of the therapy (VR session 9-16) all SIP steps were integrated into challenging interactive virtual role-plays. The interactive virtual social scenarios were designed in an iterative process with clinicians, VR experts, software engineers and researchers. The main focus of all sessions was teaching participants to cope with provocative behavior of others more adequately and to prevent own aggressive outbursts. Three virtual environments ( Figure 3) were created with Unity software by CleVR BV (Delft, The Netherlands). Within the virtual environments, participants could walk around with a Microsoft Xbox One controller. During sessions, participants wore an Oculus Rift 2 (Oculus VR, California, U.S.) a head-mounted display and headphones while they were interacting with avatars. Virtual environments and avatars were controlled by the VRAPT therapist (e.g., avatars' body movements and facial expressions). Furthermore, therapists could role-play through avatars by using a microphone with voice morphing. Because of this dynamic interactive nature of the VR software, VRAPT could be tailored to the specific needs of the participants. This allowed them to design their own learning goals and practice with specific triggers. During session 6-15, real-time heart rate (HR) and galvanic skin response (GSR) were measured and real-time displayed at the therapist interface for feedback on physical arousal. At all times, the VRAPT therapist was in control of the virtual environment and was able to immediately change and/or stop the virtual environment if necessary (e.g., in case a participant became nauseous).

Primary Outcome-Aggression
The primary outcome was level of aggressive behavior post-treatment, assessed both with staff observation and self-report. Both methods were used because they can complement each other, as a limitation of self-report questionnaires in this population is social desirability, and staff observations have limitations as aggressive behavior can occur in unobserved situations or observations can be biased. Therefore, both observation and self-report were used to allow a complete picture of the nature, degree and severity of aggressive behavior.

Social Dysfunction and Aggression Scale (SDAS)-Completed by Staff Members
Staff completed the 9-item Social Dysfunction and Aggression Scale (SDAS) [11,12]; weekly, from three months prior to baseline continuously until the end of follow-up. Items were scored on a 4-point scale ranging from absent (1) to severe (4). For each item, a general and peak score was scored [13]. The peak score refers to the most severe aggressive behavior, and the general score to the second most severe aggressive behavior in the same week. The SDAS peak score post-treatment (T2) was the VRAPT therapists were licensed psychologists and non-verbal therapists (e.g., creative therapists, psychomotor therapists). Therapists received 16 hours of training in working with the protocol and software of VRAPT from a licensed psychologist and the main author of the treatment protocol (author SKT). Therapists enrolled in VRAPT were required to have improvisations skills, and had to be familiar with role-plays. The dynamic VRAPT software allowed for personalized role-plays and exposure exercises with patients. During the training, therapists were taught how to select and play relevant social interactions, and to bring up core meanings of the aggressive situation. Furthermore, they were trained in modulating the verbal and non-verbal response of avatars in order to increase or decrease patients' aggression and emotional responses. An advantage, according to the therapist, is that they were able to test the boundaries of aggressive behavior, without disturbing the treatment relationship. Although patients were aware that the avatar was played by the therapist, they said afterwards that they were angry at the avatar, and not at the therapist. To ensure treatment integrity, a semi-structured protocol was designed as guidance for the VRAPT sessions. In addition, monthly one-hour group videoconferences were organized for VRAPT therapists to encourage treatment fidelity and discuss ongoing treatments. Furthermore, during the interventions, therapists had to complete session forms, which were checked by the research assistants.

Primary Outcome-Aggression
The primary outcome was level of aggressive behavior post-treatment, assessed both with staff observation and self-report. Both methods were used because they can complement each other, as a limitation of self-report questionnaires in this population is social desirability, and staff observations have limitations as aggressive behavior can occur in unobserved situations or observations can be biased. Therefore, both observation and self-report were used to allow a complete picture of the nature, degree and severity of aggressive behavior.

Social Dysfunction and Aggression Scale (SDAS)-Completed by Staff Members
Staff completed the 9-item Social Dysfunction and Aggression Scale (SDAS) [11,12]; weekly, from three months prior to baseline continuously until the end of follow-up. Items were scored on a 4-point scale ranging from absent (1) to severe (4). For each item, a general and peak score was scored [13]. The peak score refers to the most severe aggressive behavior, and the general score to the second most severe aggressive behavior in the same week. The SDAS peak score post-treatment (T2) was the primary outcome of the study. Research assistants collected SDAS forms on all wards on a weekly basis. In case of a missing form, they reminded the staff members. In case forms were not returned after two weeks, research assistants completed the SDAS form retrospectively, based on clinical patient files. In total, 91.5% of 4565 SDAS forms were completed timely by staff members.

Aggression Questionnaire (AVL)-Completed by Participants
Participants completed the Dutch version of the Aggression Questionnaire (AVL) [14]. This 29-item questionnaire assesses four sub traits of aggression, i.e., physical aggression, verbal aggression, anger and hostility [15]. The AVL total score post-treatment (T2) was the other primary outcome of the study.

Other Measures
Other measures included: Sociodemographic information, the Child Trauma Questionnaire-Short Form (CTQ-SF) [22]; the Igroup Presence Questionnaire (IPQ) [23], and a short interview at follow-up about experiences with VRAPT was done by a research assistant. The aim of the interview was to collect user experiences and to provide information on the impact, benefits and disadvantages of VRAPT as perceived by the participants.

Statistical Analyses
No prior studies on VR aggression prevention interventions were available. We aimed for a moderate effect size of 0.5 on primary outcomes. Using this effect size with a β-power of 0.80, alpha of 0.05 and an independent two-sided t-test to evaluate the main outcome, 64 subjects were required in each condition, total N = 128.
There were missing values on several questionnaires due to unwillingness to complete some questions, not understanding the question, or because they were forgotten. Subscales and total scores were still computed if one missing was present in a scale with less than ten items; for subscales with more than ten items, two missing values were allowed.
Data were analyzed with IBM SPSS Statistics (Version 24.0. Armonk, New York, USA). Significance was accepted at 0.05. To identify any baseline differences between the VRAPT and waiting list group, groups were compared on baseline values of outcome measures, socio-demographic and clinical characteristics with t-tests, chi-squared tests or the non-parametric Mann-Whitney U test. Next, VRAPT completers and VRAPT treatment dropouts were compared on baseline characteristics.
Intention-to-treat analyses were performed on all outcome measures. Outcomes were analyzed with multilevel analysis (MIXED command). Multilevel analyses were performed as these have a hierarchical structure; repeated measures (level 1) are nested within individuals (level 2). Assumptions for multilevel analyses were checked. Multilevel models included fixed effects for time (baseline-post-treatment or baseline-follow-up), group (VRAPT or waiting list) and the interaction time X group, and a random intercept for participant. The covariance structure was set to identity and models were estimated with the maximum likelihood method. For the variables: age, study site, baseline SDAS peak score and highest completed education level there were baseline differences between groups, therefore these variables were included as covariates in all analyses. If Treatment effects were established by the time X group interaction separately for post-treatment (T2) and follow-up (T3) by comparing them separately to pre-treatment (T1). Effect sizes for treatment effects at post-treatment and follow-up were calculated with Cohen's d, by calculating the effect size for the difference scores T1-T2 as well as T1-T3 between groups [24].

Results
In total, 128 forensic inpatients were included between 1 March 2017, and 31 December 2018. See Figure 1 for the inclusion flow-chart. Sociodemographic and clinical characteristics are presented in Table 1.
Of the VRAPT group, 67% completed all 16 VRAPT sessions. Thirteen patients (20%) dropped out during the therapy. These drop-out rates are similar to other studies conducted on psychological therapy for aggression in inpatients wards [25][26][27]. Reasons for discontinuation of the VRAPT were: drug use, cardiac arrhythmia, security restrictions, feeling uncomfortable during role-playing, moved to another clinic that was not participating in this study. Eight patients (13%) never started VRAPT after randomization. Patients who completed the VRAPT intervention did not differ significantly from the treatment drop-outs, except for age of first conviction. Means and standard deviations of all outcome measures are presented in Table 2, and test results are presented in Table 3. Concerning primary outcomes, there were no significant changes in staff-rated aggression in both groups. The means of self-reported aggression decreased both in the VRAPT and the control group, though these changes were not significant (T1-T2, F = 1.91(101.04), p = 0.17; T1-T3, F = 1.44(100.42), p = 0.23), but there was no effect of VRAPT treatment. Note. * These variables do not add to a total score as there is overlap between e.g., diagnoses and types of offenses committed by the same person. From baseline to post-treatment significant treatment effects (time X group interaction) were observed for aggression and hostility (BDHI-D total), direct aggression (subscale BDHI-D), non-planning (subscale BIS-11), anger control-out (subscale STAXI-2), and anger expression index (subscale STAXI-2). In all of these subscales/ total scores, the VRAPT group improved more than the waiting list group. However, these improvements were not maintained at three-month follow-up. No significant treatment effects were found for primary or secondary outcome measures at three-month follow-up. According to the interviews at follow-up, most participants valued VRAPT as addition to their current treatment. Half of them experienced positive changes in their daily life (Table 4). Table 4. Answers to interview questions for participants of the VRAPT condition who completed the therapy at follow-up (N = 38).

Question
Answers n (%) Note: n(%) refers to the number and percentage of participants who provided a certain answer. Data of 4 participants was missing at follow-up.

Discussion
The present study investigated the effect of a novel virtual reality aggression prevention therapy (VRAPT) in forensic psychiatric inpatients. We engaged and treated a large sample of forensic psychiatric inpatients with aggressive behavior problems. No significant improvements were found after VRAPT compared to waiting list on the primary outcomes: staff-rated aggressive behavior and self-reported aggression. With regard to secondary outcomes, self-reported aggression, anger, hostility and impulsivity decreased in both groups over time. We found positive treatment effects of VRAPT on self-reported direct aggression and hostility, anger control skills, anger expression index and non-planning impulsiveness (i.e., self-control and cognitive complexity). These effects were not maintained at 3-month follow-up.
The findings of this study are in line with the modest results of previous aggression treatment studies in forensic populations [5,6]. Recently, in the Swedish Prison and Probation Services, the largest controlled effectiveness study of ART on recidivism among convicted adult offenders was performed. They compared 1124 offenders who began ART with 3372 matched comparisons offenders who did not receive ART (2003-2009) [28]. Intention-to-treat analyses suggested no advantage for ART in reducing recidivism, although subgroup analyses with ART completers only suggested small reductions in general-not violent-reoffending.
Thus, previous literature indicates that aggressive behavior of forensic patients is not easily changed. Recommendations for improvement of aggression treatments included use of other perspectives than cognitive behavioral therapy, such as individualized training of new skills for real-life risk situations, focusing on destabilizers (i.e., impair decision making), and role-playing as a key component of treatment [29]. Furthermore, it was highlighted that studies that used role-playing were more effective in reducing (violent) re-offending than those that did not [29]. Although our intervention incorporated these recommendations, the results of our study revealed that VRAPT was also not effective in reducing aggressive behavior based on staff rating and self-report of forensic psychiatric inpatients. There are several possible explanations for these results.
First, the content of VRAPT was based on the SIP model [9,30], which is often used as an explanatory model for reactive aggression in children and adolescents, and is also a part of other aggression treatments such as ART [6]. However, this model has some limitations. Early experiences (e.g., childhood trauma) and emotional states are not explicitly represented in the steps of the SIP model, whereas they are quite relevant for aggressive behavior. For instance, childhood trauma (such as emotional abuse or witnessing violence) is highly prevalent among forensic inpatients and has been associated with aggressive behavior in adulthood [31]. Therefore childhood trauma may well be an important determinant of aggression in prisoners [32]. Furthermore, childhood trauma was associated with increased social stress reactivity in a previous VR study [33]. Thus, exploration of impact of childhood trauma on emotions and behavior in social interactions might be a valuable addition to VRAPT.
Second, the VRAPT approach was transdiagnostic. Whereas this design has important advantages in forensic settings, the considerable comorbidity of psychiatric disorders of patients in forensic psychiatry, may have contributed to the lack of significant treatment effects. For instance, appraisals and attributions to other's behaviors are also dependent on diagnosis. In psychosis, the tendency of patients to perceive others as hostile is linked to paranoid ideation [34]. However, in other disorders, these attributions of intent may be affected by other cognitive biases or psychological mechanisms, which may have been targeted to a lesser degree in VRAPT. Thus, in the current study, the inclusion of patients with different psychiatric disorders may have reduced the overall effect of the VRAPT.
Third, generalization of VRAPT experiences to daily life may have been less than optimal. Participants did not receive homework assignments and were not explicitly encouraged to practice new skills and behavior on the wards between the VRAPT sessions. Furthermore, there was a discrepancy between the VR environments and the patients' daily life. Some situations, e.g., being in a bar or supermarket, did not directly relate to their life in the highly secured treatment ward. Due to the study design, VRAPT was used as a separate treatment module, and this may have hampered the integration of VRAPT in regular care and treatment.
Fourth, VR might not be suitable for treatment of aggression of forensic psychiatric patients. However, this is not very likely, as the motivation of forensic psychiatric inpatients to participate in VRAPT was high, as evidenced by the high inclusion rate. In addition, most patients and therapists were positive about VRAPT, and during the follow-up interviews, many participants were able to recall what they had learned (e.g., more insight into their triggers; awareness of their physiological arousal). Moreover, during sessions, therapists observed that the VR role-plays clearly elicited emotions and behavior, indicating that participants were able to practice and were immersed in the VR environments.
Fifth, the outcome measures may have been suboptimal. The primary goal of VRAPT was to reduce aggression by improving social information processing mechanisms and training participants to react adequately in provocative and challenging situations. Our findings are similar to the findings from a meta-analysis that showed no significant treatment effect of aggression therapy on institutional behavior [7]. In the current study, behavior and skills in social situations were not explicitly measured, and no validated questionnaires were available to measure the different steps of the SIP model. Furthermore, the use of questionnaires demands a certain level of cognitive skills and insight, which not all participants may have had. In addition, while the SDAS is a behavioral observation scale, it was measured on the wards. Some patients were already working outside the clinic, and some incidents may have remained out of the sight of staff. Further, the SDAS intends to measure subtle forms of aggression, such as irritation, not cooperating with staff and negativism. Staff members in forensic psychiatry are often confronted with more violent forms of aggression and therefore may overlook, or not notice, more subtle forms of aggression.

Strengths and Limitations
A strength of this study was that we successfully conducted a relatively large, rigorous randomized clinical trial of aggression treatment in forensic inpatients, which has hardly been done before [5]. We achieved to include 128 inpatients of four forensic psychiatric centers within 1.5 years. This high inclusion rate seems to reflect a high willingness to engage in VR therapy, which is important as forensic psychiatric inpatients are often hard to motivate. Although drop-out rates are comparable to other studies, most reasons for drop-out were not related to the VR. Second, to the best of our knowledge, this study was the first study to investigate a VR-based treatment in forensic psychiatry. A recent systematic review concluded from pioneering studies that VR is a promising method in forensic psychiatry, but did not find any published VR intervention study [35]. The current study may therefore break new grounds in this field of research.
This study also had several limitations. Self-report measurements may have been too difficult for some participants, as some sentences and statements within the self-report questionnaires required a relatively high language level. Another limitation was a possible selection bias, as the most aggressive inpatients were not approached to participate, because the treatment supervisor did not consent to participation because of possible risks. Participants who dropped-out of VRAPT also had higher average SDAS scores and were on average 3 years younger at their first conviction than the group that completed treatment. A considerable part of the drop-outs had relatively severe aggression problems, thus the training may have been too intensive or confronting for them. However, most reported reasons for drop-out were attributable to treatment motivation problems in general, such as aggression or drug incidents on the wards. In addition, human aggression is complex behavior. Effective therapy for forensic patients with chronic and severe psychiatric and behavioral problems requires thorough and repeated training of new behavior, integrating of several factors, such as the social environment, individual factors and current emotions. It is questionable whether 16 VRAPT sessions were sufficient to achieve this, given an average treatment period of eight years TBS-treatment in a FPC [7]. In addition, within this highly secured setting, it was not possible to keep research assistants blind about the condition of the participants. Research assistants were employees of the institutions and had to have access to the daily reports of the inpatients because of clinical and security reasons. Further, many participants could not be prevented to talk about their VR experiences during the measurements. In the current study, collection of pharmacological data (i.e., medication) was hampered due to limited access into medical files. For future studies, it would be relevant to collect this data and assess the possible influence of medication on the treatment of aggression. Also, in the current study we did not have information on eligibility assessment. Treatment supervisors wanted to screen their patients on eligibility and willingness to participate, before they were visited by the independent research assistant. It was not feasible to collect data on how many patients they had screened. Finally, for secondary outcomes the p-value was not corrected for multiple testing this could have resulted in unjustifiably finding an effect.

Implications and Future Research
A key implication of the current study is that it is difficult to establish treatment effects and change aggressive behavior of forensic inpatients. Many forensic inpatients reside in prisons and forensic psychiatric care for many years, and have persistent psychiatric, behavioral and motivational problems. More comprehensive theoretical models and more clinical research are needed to improve efficacy of aggression therapy as part of a comprehensive treatment program for forensic psychiatric inpatients. As participants and therapists repeatedly reported that VRAPT was a relevant addition to patients' treatment programs and that they enjoyed practicing new behavior in VR, further research of VR aggression treatment is warranted. Effectiveness of VRAPT may be improved by adding and repeating sessions, creating virtual scenarios closer to day-to-day experiences of inpatients, integrating VR with other components of the treatment program and extending biofeedback functionalities.
In future studies, implicit aggression measurements and role-play measures to evaluate social information processing or new behavior in (provocative) social situations may be added to the self-report questionnaires and behavioral observations. One needs to be aware of the highly restricted environment in which the patients are supported by staff to inhibit anger and avoid conflicts. They can present themselves in a socially desirable way, or come across as detached, defiant or uncooperative, which influences staff observations [36]. Therefore, measuring aggressive ideations in a more implicit or experimental way may provide more insight in aggression of forensic psychiatric inpatients. Furthermore, when VRAPT is applied in a sample of participants who are already working or residing outside an FPC, it may be worthwhile to consider observed-rated measures including perceptions of relatives and friends, on top of staff-observations. This could provide a more meaningful reflection of change.
Finally, VR aggression treatment should be investigated in other populations with aggressive behavior problems, such as prisoners in a regular prison ward. VRAPT could also be tested in forensic and non-forensic outpatient samples. Forensic patients that are under semi-restricted measures (i.e., can go outside at particular times), or for those who are about to be discharged may benefit more from VRAPT. Outpatients experience more provocative triggers in their daily life than inpatients in a highly restrictive environment. In addition, it is expected that outpatients may have a more intrinsic request for help to reduce aggressive behavior. Moreover, outpatients are not punished when displaying (minor) aggression, so they are probably more open about misbehaviors. Therefore, these populations might benefit of VRAPT, as they have more opportunities to practice and generalize what they have learned in real-life.