School-based education programmes for the prevention of child sexual abuse

BACKGROUND
Child sexual abuse is a significant problem that requires an effective means of prevention.


OBJECTIVES
To assess: if school-based programmes are effective in improving knowledge about sexual abuse and self-protective behaviours; whether participation results in an increase in disclosure of sexual abuse and/or produces any harm; knowledge retention and the effect of programme type or setting.


SEARCH STRATEGY
Electronic searches of Cochrane Central Register of Controlled Trials, MEDLINE, EMBASE, PsycINFO, CINAHL, Sociological Abstracts, Dissertation Abstracts and other databases using MESH headings and text words specific for child sexual assault and randomised controlled trials (RCTs) were conducted in August 2006.


SELECTION CRITERIA
RCTs or quasi-RCTs of school-based interventions to prevent child sexual abuse compared with another intervention or no intervention.


DATA COLLECTION AND ANALYSIS
Meta-analyses and sensitivity analysis, using two imputed intraclass correlation coefficients (ICC) (0.1, 0.2), were used for four outcomes: protective behaviours, questionnaire-based knowledge, vignette-based knowledge and disclosure of abuse. Meta-analysis was not possible for retention of knowledge, likelihood of harm, or effect of programme type and setting.


MAIN RESULTS
Fifteen trials measuring knowledge and behaviour change as a result of school-based child sexual abuse intervention programmes were included. Over half the studies in each initial meta-analysis contained unit of analysis errors. For behaviour change, two studies had data suitable for meta-analysis; results favoured intervention (OR 6.76, 95% CI 1.44, 31.84) with moderate heterogeneity (I(2)=56.0%) and did not change significantly when adjustments using intraclass coefficients were made. Nine studies were included in a meta-analysis evaluating questionnaire-based knowledge. An increase in knowledge was found (SMD 0.59; 0.44, 0.74, heterogeneity (I2=66.4%). When adjusted for an ICC of 0.1 and 0.2 the results were SMD 0.6 (0.45, 0.75) and 0.57 (0.44, 0.71) respectively. Heterogeneity decreased with increasing ICC. A meta-analysis of four studies evaluating vignette-based knowledge favoured intervention (SMD 0.37 (0.18, 0.55)) with low heterogeneity (I(2)=0.0%) and no significant change when ICC adjustments were made. Meta-analysis of between-group differences of reported disclosures did not show a statistically significant difference.


AUTHORS' CONCLUSIONS
Studies evaluated in this review report significant improvements in knowledge measures and protective behaviours. Results might have differed had the true ICCs from studies been available or cluster-adjusted results been available. Several studies reported harms, suggesting a need to monitor the impact of similar interventions. Retention of knowledge should be measured beyond 3-12 months. Further investigation of the best forms of presentation and optimal age of programme delivery is required.


T A B L E O F C O N T E N
Questionnairebased knowledge (factual knowledge measured by assessing responses to items on a questionnaire or multi-choice test, immediately post intervention) (higher score = higher knowledge) The mean knowledge score measured using a variety of scales across control groups ranged from 3 to 64 The mean knowledge score in the intervention groups was 0.61 standard deviations higher (0.45 higher to 0. 78 higher)

(18)
⊕⊕⊕ moderate 2 Vignette-based knowledge (applied knowledge measured by assessing responses to hypothetical scenarios, immediately post intervention) (higher score = higher knowledge) The mean knowledge score measured using a variety of instruments across control groups ranged from 1 to 42 The mean knowledge score in the intervention groups was 0.45 standard deviations higher (0.24 higher to 0. 65 higher) 1688 (11) ⊕⊕⊕ moderate 2

Results favoured intervention
Harm (measured using anxiety or fear questionnaires) The mean anxiety or fear score measured using a variety of scales across control groups ranged from 2 to 7 The mean anxiety or fear score in the intervention groups was 0.08 standard deviations lower (0.22 lower to 0.07 higher)

⊕⊕⊕ moderate 3
Results showed no increase or decrease in anxiety or fear Results favoured intervention, however when adjusted for unit of analysis errors, this effect disappeared *The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; RR: risk ratio; OR: odds ratio GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.

Description of the condition
Child sexual abuse is a problem of considerable magnitude with short-and long-term repercussions for those victimised. There is no universal definition of child sexual abuse (Macdonald 2001;Trickett 2006). It is a term used to describe a range of experiences involving a child in unwanted, inappropriate, coercive, and unlawful sexual exploitation by an adult or older child. The World Health Organization (WHO) definition states that "child sexual abuse is the involvement of a child in sexual activity that he or she does not fully comprehend, is unable to give informed consent to, or for which the child is not developmentally prepared and cannot give consent, or that violates the laws or social taboos of society" (WHO 1999, p 15). Child sexual abuse is categorised along a continuum according to the type of abuse experienced by the child: involving physical body contact (using the term 'contact child sexual abuse') or not involving physical body contact (using the term "non-contact child sexual abuse"). Contact acts include unwanted touching, fondling, masturbation, frottage, oralgenital contact, and vaginal or anal penetration by a penis, finger or other object. Non-contact acts include making sexual comments, voyeurism ('peeping'), exhibitionism ('flashing'), exposing a child to pornography, or making pornography (Finkelhor 2008;Putnam 2003). Recent meta-analyses of data collected from retrospective studies of adults in countries and cultures worldwide estimate that 10% to 20% of female children, and 5% to 10% of male children, have experienced child sexual abuse on a spectrum from exposure through unwanted touching to penetrative assault before the age of 18 years (Barth 2013;Ji 2013;Pereda 2009;Stoltenborgh 2011). These data are likely to underestimate its true prevalence because two-thirds of individuals never disclose their victimisation (London 2005) and most cases go unreported to authorities (Wyatt 1999). The WHO estimates that child sexual abuse contributes to seven to eight per cent of the global burden of disease for females, and four to five per cent for males (Andrews 2004). Child sexual abuse is associated with adverse psychosocial outcomes such as depression (Roosa 1999), post-traumatic stress disorder (Widom 1999), antisocial and suicidal behaviours (Bensley 1999), eating disorders (Perkins 1999), alcohol and substance abuse (Spak 1998), post-partum depression and parenting difficulties (Buist 1998), sexual re-victimisation, and sexual dysfunction (Fleming 1999). A recent meta-analysis found child sexual abuse was also associated with higher rates of physical health conditions, including gastrointestinal, gynaecological, and cardiovascular problems, and obesity (Irish 2010). A longitudinal analysis of the association between childhood sexual abuse and educational achievement found a clear linear relationship between increasing severity of child sexual abuse and poorer educational achievement, however the relationship was confounded by sociodemographic characteristics (e.g. lower maternal age and qualifications) and family functioning variables (e.g. inter-parental violence) known to be associated with child maltreatment (Boden 2007). These consequences are far-reaching into families and communities, with significant costs for institutions in terms of primary and rehabilitative health care, education and welfare assistance, child protection, and justice system costs (Fang 2012).
Given the retrospective nature of many studies, it is unclear what proportion of survivors go on to experience adverse outcomes and how sexual abuse interacts with other potential risk factors for these adverse outcomes. However, outcomes are known to vary for individuals according to: child age and gender; perpetrator age and gender; the relationship between child and perpetrator; the severity, duration, and/or frequency of the abusive act(s); accompanying physical or emotional violence and/or force; and the presence of other forms of victimisation (Putnam 2003;Trickett 1997). Sexual abuse has been reported across all socioeconomic and ethnic groups, in both males and females, and perpetrators can include those outside the family as well as within it (Finkelhor 1993); they can be adults or other young people (Turner 2011). However, all children are not at equal risk. Risk factors for child sexual abuse, mainly identified in Western countries, include being female (Fergusson 1996), having a physical or mental disability (Westcott 1999), living without a natural parent (Finkelhor 1986;Finkelhor 1990), parental mental illness, parental alcohol or drug dependency, and young maternal age (Fergusson 1996;Holmes 1998;MacMillan 2013). Girls appear to be more likely to be sexually abused by family members and boys by non-family members (Finkelhor 1990). The time of greatest vulnerability for child sexual abuse is between 7 and 12 years of age (Finkelhor 1986).

Description of the intervention
This review focuses on the most widely used strategy for the prevention of child sexual abuse: the provision of school-based programmes. Some terms commonly used to describe these programmes include: personal safety education (NCMEC 1999); protective behaviours (Flandreau-West 1984); personal body safety (Miller-Perrin 1990); body safety (Wurtele 2007); and child assault prevention and child protection education (NSW Department of School Education 1998). These programmes target children and adolescents aged 5 to 18 years who are students in primary (elementary) or secondary (high) schools. Support for interventions of this type can be found in Article 19 of the United Nations Convention on the Rights of the Child, an international law, which states that governments should "take all appropriate legislative, administrative, social and educational measures to protect the child from all forms of physical or mental violence, injury or abuse, neglect or negligent treatment, maltreatment or exploitation, including sexual abuse" (United Nations 1989).
Education programmes to reduce the occurrence of sexual abuse in children and adolescents were first developed by women's rape prevention collectives in the United States of America (USA) in the 1970s (Berrick 1991). School-based programmes for the prevention of child sexual abuse were rapidly and widely adopted across the USA, assisted in some states by policy mandates, and by the mid 1990s it was estimated that two-thirds of 10-to 16-year olds in the USA had participated in such programmes (Finkelhor 1995c). Schools are a logical choice for teaching children about sexual abuse and its prevention, given their primary function is to educate (Wurtele 2009), and the content of prevention programmes aligns with proscribed school health curricula (Walsh 2013). Hence, schools have emerged as an important primary and secondary prevention setting providing access to large populations of children and adolescents, and relatively economical service delivery, without stigmatising those who may be at particular risk (Wurtele 2010). School-based child sexual abuse prevention programmes are typically presented to groups of students and are tailored to ages and cognitive levels. Programme content covers themes such as body ownership; distinguishing types of touches; identifying potential abuse situations; avoiding, resisting, or escaping such situations; secrecy; and how and whom to tell if abuse has occurred (Duane 2002;Topping 2009). Many programmes also stress that the child or adolescent is not to blame. Programmes vary in the number of, and extent to which these themes are covered. There is considerable variability in programme delivery formats and teaching methods. Formats such as books, comics, dramatic plays, puppet shows, films, lectures, and discussions have been used with some programmes employing single formats, whereas others use combinations of formats (Duane 2002;Topping 2009;Wurtele 1987a). Programme teaching methods have been conceptualised on a continuum from those employing purely didactic approaches, such as a speech, address, or talk, stressing students' passive listening and acquisition of knowledge, to those employing behavioural approaches, such as modelling, and emphasising students' active participation in role-play, rehearsing, or practising new self protection skills (Wurtele 1987a). The duration and frequency of programmes is diverse, with 30 minutes being a common length as this fits with a standard school lesson period. Programmes also vary in their scope with some programmes dealing only with child sexual abuse, whereas others integrate these themes into programmes covering broader issues such as general safety education, social and emotional learning, mental health and well being, respectful relationships, and sexuality education. This review focuses only upon interventions in which prevention of child sexual abuse is the main goal.

How the intervention might work
The ultimate goal of child sexual abuse prevention education is to prevent children from ever experiencing abuse. It is also impor-tant, in cases where children have experienced abuse, for adults to respond quickly and effectively to disclosures, to protect them from further victimisation, and to limit the harm caused. From a public health perspective (Rosenberg 1991), comprehensive approaches to child sexual abuse would involve multiple "prevention targets", including (i) offenders and potential offenders, (ii) children and adolescents, (iii) situations, and (iv) communities (Smallbone 2008, p 47).
Although not yet rigorously researched, it appears that school-based programmes may also work to enhance community capacity for sexual abuse prevention by raising awareness and delivering information to multiple members of children's social systems (Duane 2002), via provision of information packages to parents, training for teachers, and family participation in homework activities. School-based sexual abuse prevention programmes focus on children and adolescents as prevention targets. They seek to prevent child sexual abuse by providing students with knowledge and skills to recognise and avoid potentially sexually abusive situations, and with strategies to physically and verbally repel sexual approaches by offenders. They endeavour to minimise harm by disseminating messages about appropriate help seeking in the event of abuse or attempted abuse. Interventions aim to transfer the knowledge and skills learned by the child or adolescent in the classroom to reallife situations. Interventions work by capitalising on principles used by classroom teachers, most notably social cognitive learning theories (Bandura 1986;Vygotsky 1986), which stress the social context of learning via the use of instruction, modelling, rehearsal, reinforcement, and feedback (Wurtele 1987a). Do programmes actually prevent child sexual abuse? There is some evidence from a small group of studies, all of which have been conducted in the USA, that participation in school-based child sexual abuse prevention programmes may decrease the occurrence of child sexual abuse. A study of 2000 10-to 16-year olds found that those exposed to more comprehensive prevention education were more knowledgeable about sexual abuse, more likely to report using self protection strategies, more likely to report protective efficacy, more likely to have disclosed their victimisation, and less likely to engage in self blame (Finkelhor 1995a). In a followup study, the same individuals were more likely to use the protective strategies they had been taught when confronted with threats and assaults (Finkelhor 1995b). Two studies with high-school (Ko 2001) and college students (Gibson 2000) showed programmes were associated with reduced incidence of child sexual abuse. However these studies harbour the limitations of retrospective recall and have not been replicated with larger and more diverse samples. Research with sexual offenders on their perceptions of the efficacy of children's self protection strategies in actual abuse situations has found the most effective strategy, reported by three-quarters of offenders, was to tell the offender they did not want to participate in sexual activities. Girls under the age of 12 years effectively used six strategies to avoid abuse: demanding to be left alone, saying they would tell someone, crying, saying they were scared, saying the they did not want to, and saying "no" (Leclerc 2011). These strategies are key content in school-based child sexual abuse prevention programmes (Duane 2002).

Why it is important to do this review
Despite widespread adoption into the school curriculum in many countries, conclusions about the effectiveness of school-based programmes for the prevention of child sexual abuse remain tentative. A number of research synthesis studies have been conducted on this topic in the form of meta-analyses, and systematic and narrative reviews (see Table 1: Previous reviews). However the findings have been limited by methodological weaknesses in the reviews (e.g. including non-randomised as well as randomised studies; aggregation of diverse outcomes; inappropriate analytical approaches), and in the individual studies included in the reviews (e.g. use of diverse measures; inadequate measurement of programme fidelity). Additionally, previous meta-analyses have differed in their parameters and have not been replicated. Further, there are historical distinctions in previous reviews, for example, the classification of programmes as primarily active or passive, behavioural or instructional, that warrant further exploration; this particular distinction seems artificial from an educational perspective because many programmes are, in practice, multifaceted, involving a number of teaching methods that are used in integrated ways to deliver programme content (MacMillan 1994). What is needed is a way of identifying, more precisely, the range of child, programme, and study design characteristics that may moderate programme effectiveness. Evaluations of discrete programmes have been limited to authors assessing and reporting on one or more of five measures: (i) knowledge gains, (ii) skills gains, (iii) sexual abuse disclosures, (iv) negative programme effects or harms, and (v) subsequent incidence of child sexual abuse (Smallbone 2008). Consistent with previous reviews, the original Cochrane review found improvements in knowledge and protective behaviours (skills) among children who had received school-based programmes (Zwi 2007). Findings on disclosures, harm, and retention of knowledge over time were inconclusive. As this was the most rigorous of the reviews ever conducted (Mikton 2009), and is the only review to include risk of bias analyses, the review also uncovered many methodological quality issues that warrant ongoing monitoring and review. This is important because the historical controversy over schoolbased child sexual abuse prevention programmes is concentrated on two outcomes: programmes' actual effectiveness in preventing child sexual abuse, and concerns over negative programme effects (Finkelhor 2007). Evidence on programmes' effectiveness with regard to the fifth and arguably the most important measure, the degree to which programmes actually reduce the incidence of child sexual abuse, remains a pressing and unanswered empirical question that requires ongoing review.
It has been suggested that education programmes can cause harm to participating children and adolescents (Taal 1997). This is reported to be a common parental concern (Finkelhor 2007; Tutty 1993). Some studies report few or no evaluated negative effects on children (Tutty 1997), whereas others suggest potentially harmful sequelae. For example, some children report increased worry following programme participation (Finkelhor 1995c) and older children have been found to experience more negative feelings about non-sexual physical touch (Taal 1997). Therefore, there is a need to rigorously evaluate the evidence for these programmes, both in terms of beneficial and harmful outcomes, and to update the current evidence base on programme effectiveness.

O B J E C T I V E S
To systematically assess evidence of the effectiveness of schoolbased education programmes for the prevention of child sexual abuse. Specifically, to assess whether: programmes are effective in improving students' protective behaviours and knowledge about sexual abuse prevention; behaviours and skills are retained over time; and participation results in disclosures of sexual abuse, produces harms, or both.
The original review and the current update do not address whether these programmes or other interventions have reduced the incidence and/or prevalence of child sexual abuse at the population level as reported by official records (e.g. from statutory child protection services, law enforcement, primary care, or hospital data), and/or community prevalence data (e.g. from self report surveys repeated at regular intervals). This objective may be incorporated in future review updates as research advances in this field.

Types of studies
We included studies in the original review, and in this update, if they were randomised controlled trials (RCTs), cluster-RCTs, or quasi-RCTs where participants were allocated to the intervention or control group by day of the week, alphabetical order, or other sequential allocation such as class or school. In decision making for inclusion in the review, we focused on features of study design rather than design labels.

Types of participants
The study population comprised children (aged 5 to 12 years) and adolescents (aged 13 to 18 years) attending primary (elementary) or secondary (high) schools.

Types of interventions
Included interventions were school-based education programmes focusing on knowledge of sexual abuse and sexual abuse prevention concepts, or skill acquisition in protective behaviours, or both, compared with no intervention or the standard school curriculum. For this update, we excluded: interventions for preventing relationship and dating violence, and sexually coercive peer relationships, as these were reviewed in another Cochrane review (Fellmeth 2013); interventions for abduction prevention, the aims of which did not clearly refer to prevention of child sexual abuse; interventions aimed broadly at child protection or personal safety in which it was not possible to isolate the effects of the sexual abuse component; and interventions set entirely in before-and afterschool programmes, and early childhood programmes that were not in schools (e.g. day-care settings).

Types of outcome measures
Child outcome measures were: 1. protective behaviours (as measured by an independently scored simulation test); 2. knowledge of sexual abuse or knowledge of sexual abuse prevention concepts, or both (as measured by questionnaires or vignettes); 3. retention of protective behaviours over time; 4. retention of knowledge over time; 5. harm, manifest as parental or child anxiety or fear (as measured by questionnaires); and 6. disclosure of sexual abuse by child or adolescent during or after programmes (as measured by official records of student self reports to school staff, child protective services, or police). Outcomes measured did not form criteria for inclusion in the review. We included studies meeting the inclusion criteria for types of study, participants, and interventions only.

Search methods for identification of studies Electronic searches
We completed the most recent searches for this review update on 8 September 2014. We incorporated new search terms to describe recent concepts, such as child sexual abuse in online contexts, and the increasing use of terms such as 'exploitation' and 'victimisation' by researchers when describing child sexual abuse. Searches for the previous review were completed in August 2006. Where possible, we focused on finding new studies and identifying older studies added to databases since that time. We added five new sources (two trials registers, two conference proceedings indexes, and one source of open access dissertations), and searched these for all available years (see Appendix 1). Search strategies used for the original review are in Appendix 2. The list of the databases searched and the time period they cover (for the original review and for this review update) are listed below: •

Searching other resources
Other sources of information searched included the reference lists of previous systematic and narrative reviews, and reference lists of included studies. We also searched databases of programme evaluations such as the Promising Practices Network (RAND Corporation 2013), and Blueprints for Healthy Youth Development (CSPV 2013). To identify unpublished studies, we circulated requests via email to relevant listservs (e.g. Child-Maltreatment-Research-Listerv).

Selection of studies
We conducted selection of studies in three phases. In phase one, we imported titles and abstracts of articles identified in the searches into reference management software and review authors KZ andSW (2007 and2009 searches), KW and KZ (2013 searches), and KW and AS (2014 searches) independently screened them. We excluded papers if they clearly did not meet the inclusion criteria (i.e. study design, participants, type of intervention, types of comparisons). In phase two, two review authors (KZ and SW in 2007;KZ and KW in 2013;KW and AS in 2014) independently screened the titles, abstracts, and methodology sections of papers appearing to meet inclusion criteria. In phase three, we retrieved the full text of studies meeting all inclusion criteria for data extraction and we linked together multiple reports of the same study (e.g. Blumberg 1991). One study was translated into English (Del Campo Sanchez 2006). In cases where agreement could not be reached during screening, we asked a third and fourth review author to independently assess the study against the inclusion criteria, and we resolved these cases via discussion and consensus.

Data extraction and management
For this update, we used an electronic data extraction proforma adapted from the checklist of items specified in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011, Table  7.3a). Two review authors (KZ and SW in 2007) independently performed data extraction. KW repeated data extraction for all 24 studies in 2013, with KZ extracting data independently for new studies in 2013. No data extraction was required in 2014 as no further studies met the inclusion criteria. The data were entered into RevMan by KZ (Review Manager 4.2 in 2007) and KW (Review Manager 5.2 in 2013), and independently checked for accuracy by a research assistant who was not involved in the review. We resolved discrepancies via discussion. We asked authors of studies in which methods of sequence generation, allocation concealment, or blinding were unclear to provide additional information (see Assessment of risk of bias in included studies). We contacted corresponding authors of studies with insufficient information to allow inclusion in meta-analyses (Harvey 1988;Saslawsky 1986Saslawsky in 2007Chen 2012;Kraizer 1991Kraizer in 2013 and studies that used cluster-randomisation (Dake 2003; see Unit of analysis issues) via email with a request to provide additional data. In some instances, authors were able to provide data as requested, however, the majority did not respond to requests. It is not possible to know for sure that all authors received our correspondence.

Assessment of risk of bias in included studies
In the original review, two review authors (KZ, SW) independently assessed each included study. In the review update, the procedure was repeated by one review author (KW) who independently assessed risk of bias for all included studies and compared these results to those obtained in the original review, with KZ assessing risk of bias independently for new studies in 2013. KW repeated assessment of risk of bias after a six-month interval. There were no discrepancies. We undertook no 'Risk of bias' assessment in 2014 as no further studies met the inclusion criteria. Review authors assessing risk of bias were not blinded to the names of the authors, institutions, journals, or results of studies. We assessed risk of bias using the seven domains on the Cochrane revised 'Risk of bias' assessment tool (Higgins 2011, Table 8.5a): (i) random sequence generation; (ii) allocation concealment; (iii) blinding of participants and personnel; (iv) blinding of outcome assessment; (v) incomplete outcome data; (vi) selective reporting; and (vii) other sources of bias. We assessed included studies on each domain as 'low risk', 'high risk', or 'unclear risk' of bias. We made judgements by answering 'yes' (assessed as low risk of bias), 'no' (assessed as high risk of bias) or 'uncertain' (assessed as unclear risk of bias) to pre-specified questions for each domain. We used verbatim text from study reports as support for each judgement of risk wherever possible. We entered information into RevMan and summarised it in a 'Risk of bias' table for each included study. We generated two summary figures: a 'Risk of bias' summary ( Figure  1) visually depicting judgements across all studies, and a 'Risk of bias' graph ( Figure 2) illustrating the proportion of studies for each risk of bias criterion. Risk of bias domains are detailed below.

Random sequence generation (selection bias)
Description: The method used to generate the allocation sequence was described in sufficient detail to enable assessment of the extent to which it could produce comparable groups. In other words, a rule, based on some chance process, was adequately applied. Questions: Do study authors make an explicit statement about random assignment? What methods were used to randomly assign participants to intervention and control groups? Judgement: Was the allocation sequence adequately generated?

Allocation concealment (selection bias)
Description: The method used to conceal the allocation sequence was described in sufficient detail to enable assessment of whether the assignment of participants to groups could have been predicted ahead of time, or during the assignment process. Upcoming allocations were concealed from those allocating participants to groups. Questions: Do the study authors report a method of concealing allocation of participants to intervention or control groups? Is there evidence that the method was potentially unconcealed? Judgement: Was allocation adequately concealed?

Blinding of participants and personnel (performance bias)
Description: The measures used to blind study participants and personnel (such as programme facilitators or teachers) from knowledge of participant intervention or control group membership was described in sufficient detail to enable assessment of the effects of this knowledge on study outcomes. Questions: Do study authors report procedures for blinding? What specific blinding procedures were used? Was blinding achievable for this type of intervention? Judgement: Was participant and personnel knowledge of the allocation to intervention or control group adequately withheld?

Blinding of outcome assessment (detection bias)
Description: The measures used to blind outcome assessors from knowledge of participant intervention or control group membership were described in sufficient detail to enable assessment of the effects of this knowledge on outcome assessment or data collection, or both. Questions: Do study authors report procedures for blinding of individuals responsible for outcome assessment or data collection, or both? What specific blinding procedures were used? Was blinding achievable for this type of intervention? Judgement: Was outcome assessors' knowledge of the allocation to intervention or control group adequately withheld?

Incomplete outcome data (attrition bias)
Description: Complete outcome data are reported for each main outcome in sufficient detail to enable assessment of group differences owing to missing data. Complete outcome data include: attrition, exclusions, numbers of participants in each intervention and control group compared with the total number of participants randomised, and reasons for attrition and exclusions. Questions: Do study authors report attrition, exclusions, numbers of participants in each intervention and control group compared with the total number of participants randomised, and reasons for attrition and exclusions? Are imputation methods explained? Judgement: Were outcome data adequately addressed?

Selective reporting (reporting bias)
Description: The extent of outcome reporting is sufficient to enable assessment of the possibility of selective outcome reporting, that is, reporting of some outcomes and not others depending on the nature and direction of results. Questions: Do study authors report complete outcome data that match the aims or hypotheses of the study? Do study authors report on all pre-specified outcomes of interest? Judgement: Are reports of the study free of suggestion of selective outcome reporting?

Other sources of bias
Description: Any other important concerns about bias not addressed in other domains. Questions: Do study authors report studies in sufficient detail to enable assessment of other important risks of bias (e.g. related to the specific study design, extreme baseline imbalances, or contamination effects)? Judgement: Was the study free of other problems that could put it at a high risk of bias?

Measures of treatment effect
According to the review protocol (Zwi 2003), for individual trials we planned to report the risk ratio (RR) and risk difference (RD) with 95% confidence intervals (CI) for dichotomous outcomes and mean differences (MD) with 95% CI for continuous variables. For the meta-analysis, where possible, we planned to report the RR and RD with 95% CI for dichotomous outcomes and MD with 95% CI for continuous variables. Elsewhere in the protocol (e.g. p 4) odds ratios (OR) are also mentioned.
In the original review, and in this review update, we reported the summary of effect for dichotomous outcomes as an OR with 95% CI. Odds ratios are the statistic used most often in this field. For continuous outcomes this was to be reported as the standardised mean difference (SMD) with 95% CI. Standardised mean differences are appropriate for data synthesis where different outcome measures are used across studies.

Unit of analysis issues
In the review protocol (Zwi 2003), in the case of cluster-RCTs, we planned to adjust for unit of analysis errors where the intraclass correlation coefficient (ICC) was available. In the original review, and in this review update, some included studies involved cluster-randomisation at the level of the class, school, or district. However, ICCs were not reported in the studies, nor were they available from study authors. No published ICC for school-based child sexual abuse prevention interventions could be found. We noted that estimates of 0.1 and 0.2 had been used in a review of school-based violence prevention programmes (Mytton 2006), based on the rationale for a published ICC of 0.15 for similar trials (CPPRG 1999bin Mytton 2006, and was considered a plausible yet conservative estimate for the impact of clustering at the classroom level (Schochet 2008). We reasoned that a suitably conservative approach would be to use the extremes of ICC 0.1 and 0.2 to calculate a design effect for each cluster-RCT according to the formula given in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011, Section 16.3.4) which is: 1 + (mean cluster size -1) ICC. We weighted these using the generic inverse variance function and used random-effect models. Some studies included in this review had multiple intervention groups (Blumberg 1991;Crowley 1989;Dawson 1987;Krahé 2009;Poche 1988). In these cases, we combined all relevant intervention groups into a single group, and all relevant control groups into a single group. Using the tools available in Review Manager 5.2, we combined means and standard deviations (SD) for continuous outcomes, and summed sample sizes and number of outcomes across groups for dichotomous outcomes. This enabled us to make comparisons between groups using pair-wise comparisons without risk of double-counting participants.

Dealing with missing data
Requirements for dealing with missing data in Cochrane Reviews have changed since the protocol for this review was written (Zwi 2003). We identified several types of missing data in this review update: missing outcomes, missing summary data, and missing participants. For missing outcomes (e.g. disclosures, adverse outcomes) and missing summary data (i.e. group size totals, means, SDs), we contacted corresponding study authors to provide the outstanding data. Some authors responded helpfully to these requests, but data could only be provided for the most recent studies; in other cases, data had been collected over two decades ago and were no longer available. In some cases, authors did not respond. If data remained unavailable after these processes, we excluded these studies from the analyses. For missing participants, we reported the attrition rate wherever possible in the 'Risk of bias' tables beneath the Characteristics of included studies table.

Assessment of heterogeneity
We assessed heterogeneity (study diversity) visually and by examining the I² statistic (Higgins 2002), a quantity which describes the proportion of variation in point estimates that is due to heterogeneity rather than sampling error. We supplemented this with a statistical test of homogeneity to determine the strength of evidence for genuine heterogeneity using a significance level of P value > 0.05.

Assessment of reporting biases
To assess reporting biases, we used two approaches to investigate the relationship between effect size and sample size (Borenstein 2009). We drew fixed-effect forest plots with studies plotted according to weight (i.e. from most to least precise). We noted any trend towards greater effect sizes at the bottom of the plots indicative of bias attributable to missing studies. We also drew fixed-effect funnel plots and checked them for asymmetry indicating the presence of publication bias. In both approaches, trends or asymmetry could be due to publication or related biases (e.g. language bias, availability bias, citation bias) or due to genuine differences between small and large trials (Borenstein 2009;Egger 1997). If a relationship was identified, we further examined differences between studies as a possible explanation along with comparisons by source (e.g. peer-reviewed journals; theses). We planned to conduct these analyses only when there was a reasonable number of studies (more than 10) and a reasonable amount of dispersion in sample sizes. To reduce the effects of publication bias, in the review update, we made efforts to retrieve the full texts of unpublished trials (e.g. theses). This was made easier by virtue of the fact that many had been made available on electronic databases since our previous searches were conducted and document delivery services had improved.

Data synthesis
We synthesised the data using tools provided in Review Manger (RevMan) 5.2 (RevMan 2012). We assessed the appropriateness of combining studies based on sufficient comparability with respect to: the type of intervention, the type of outcome measures, and the nominated data collection points pre-and post-intervention. We calculated summary statistics (OR for dichotomous data and SMD for continuous data) with 95% CIs for each study. We had intended to use a fixed effect model to combine data in the first instance and then to adopt a random effects model where the I square value exceeded 30%. On further consideration of the differences between the included studies in terms of their setting and intervention, we decided instead to adopt a random effects model to combine data. In all cases, we generated pooled estimates for those studies for which complete statistical data were available or could be derived (i.e. counts and proportions for dichotomous data, and means and SDs for continuous data). Forest plots are presented for each of the pooled estimates. In all cases, we corrected for small sample size bias by using Hedges' g, which is the default in Review Manager (RevMan) 5.2 (RevMan 2012).
We planned to conduct analyses on the six outcomes nominated above: (i) protective behaviours; (ii) knowledge of sexual abuse or knowledge of sexual abuse prevention concepts, or both; (iii) retention of protective behaviours over time; (iv) retention of knowledge over time; (v) parental or child anxiety or fear; (vi) disclosure of sexual abuse. To manage subtle differences in outcome measurement for (ii) (knowledge), we created subgroups according to the category of measurement instrument used (i.e. questionnairebased knowledge or vignette-based knowledge). There were insufficient data to proceed with analysis for retention of protective behaviours over time. No studies measured parental anxiety or fear.

Subgroup analysis and investigation of heterogeneity
In the review protocol (Zwi 2003), we specified the conduct of subgroup analyses to assess the impact of clinically relevant differences: (i) in the interventions (e.g. passive or active involvement of participants); and (ii) between groups of participants (e.g. gender, school setting). We did not conduct subgroup analyses because there was insufficient information provided in the included studies about issues that were hypothesised as being relevant for subgroup analysis, for example, studies did not always provide a breakdown of student gender by intervention group. Further, upon close scrutiny, interventions did not appear to fit an active/passive dichotomy with many having multiple components of both active and passive types (e.g. a video or DVD presentation may at times require children to sit still and listen, and at other times, to respond, chant, sing, or move). Further, there were insufficient numbers of studies to allow for meaningful comparisons. This will be elaborated further below.

Sensitivity analysis
We conducted sensitivity analysis to explore the extent to which results were influenced by risk of bias. We conducted a series of sensitivity analyses removing from the analyses studies with high risk of bias for: (i) allocation concealment (selection bias); (ii) blinding of outcome assessors (detection bias); (iii) incomplete outcome data (attrition of over 20%), and (iv) selective reporting (reporting bias). We also conducted sensitivity analyses to determine the impact of unit of analysis errors, arising from inadequate adjustment for cluster-randomisation in published results.

Rating the quality of the evidence
We rated the quality of the evidence for our main outcomes according to methods for rating evidence from randomised controlled trial developed by the GRADE working group (http:// www.gradeworkinggroup.org/). For each outcome of interest the evidence started at high quality and could be downgraded to moderate, low or very low quality after consideration of the possible impact of risk of bias, imprecision, inconsistency, indirectness and publication bias on our confidence in the effects of intervention.
We have presented results for the primary analyses, quality ratings, and explanations for downgrading any decisions for the following outcomes in a 'Summary of Findings' table: • Protective behaviours (self protective events measured using a stranger simulation test immediately post intervention) • Questionnaire-based knowledge (factual knowledge measured by assessing responses to items on a questionnaire or multi-choice test, immediately post intervention) • Vignette-based knowledge (applied knowledge measured by assessing responses to hypothetical scenarios, immediately post intervention) • Harm (measured using anxiety or fear questionnaires) • Disclosures (of past or current child sexual abuse made during or after programme completion)

Results of the search
For this update, we searched the period from August 2006 to September 2014 (see Appendix 1). We identified a total of 12,969 records through database searching and a further 58 records from other sources. After duplicates were removed, we screened 10,218 records and excluded 10,161 records. We retrieved and evaluated the full-text reports of the remaining 57 records for eligibility. Of these, we excluded 43 reports, with reasons reported in the Characteristics of excluded studies table. From the remaining papers, we identified: 10 new included studies, one of which was translated from Spanish into English (Del Campo Sanchez 2006); three additional reports of two included studies from the previous review (Blumberg 1991;Fryer 1987b); and one ongoing study (NCT02181647). Searches for the original review covered the period up to August 2006 (Appendix 2). The previous review was based on 15 included studies. We excluded one of the previously included studies from this update (Pacifici 2001), because we reassessed it as not meeting the eligibility criterion for type of intervention, being focused on sexual violence prevention in the context of dating relationships for adolescents (see Fellmeth 2013), rather than explicitly on knowledge of child sexual abuse and its prevention. In total, this updated review reports on a total of 24 unique trials reported in 29 papers ( Figure 3).

Included studies
The Characteristics of included studies table summarise details for each of the 24 included studies.

Sample sizes
The total number of participants randomised in cluster-RCTs ranged from 74 (Poche 1988) to 1269 (Oldfield 1996). The total number of students randomised in trials with individuals as the unit of randomisation ranged from 46 (Chen 2012) to 382 (Del Campo Sanchez 2006). The number of participants in the 13 cluster-RCTs ranged from 74 (Poche 1988) to 1269 (Oldfield 1996), and in the nine RCTs in which participants were randomised as individuals, ranged from 36 ( e en-Ero ul 2013) to 231 (Tutty 1997). Eleven studies (including nine cluster-RCTs and two studies in which participants were randomised as individuals) each included more than 200 participants.

Settings
All studies were conducted in school settings: 23 in primary (elementary) schools and one in a special school for adolescents with intellectual disabilities. Only six studies were undertaken in single grades: one in kindergarten (Harvey 1988), one in grade one (Grendel 1991), two in grade three (Dake 2003;Kolko 1989), and two in grade four (Snyder 1986; e en-Ero ul 2013). All other studies involved various combinations of grades to which there was no discernable pattern. It is possible to categorise the studies into three broad age group blocks as follows: (i) 10 studies with younger participants from kindergarten to grade three (Blumberg 1991;Dake 2003;Fryer 1987a;Grendel 1991;Harvey 1988;Hébert 2001;Kolko 1989;Krahé 2009;Kraizer 1991;Poche 1988); (ii) eight studies with older participants from grade four upwards (Crowley 1989;Dawson 1987;Del Campo Sanchez 2006;Hazzard 1991;Lee 1998;Snyder 1986;Wolfe 1986; e en-Ero ul 2013); and (iii) six studies with younger and older participants together (Chen 2012;Daigneault 2012;Oldfield 1996;Saslawsky 1986;Tutty 1997;Wurtele 1986). None of the included studies were conducted in secondary (high) school settings.

Participants
A total of 5802 school-aged participants were included in the 24 trials. Study participants' mean ages at baseline in the included studies ranged from 5.8 years (Harvey 1988) to 13.44 years (Lee 1998). Authors of eight studies did not report the mean age of participants at baseline (Crowley 1989;Del Campo Sanchez 2006;Fryer 1987a;Hazzard 1991;Kraizer 1991;Oldfield 1996;Tutty 1997; e en-Ero ul 2013). The proportion of females in the included studies ranged from 45% (Poche 1988; e en-Ero ul 2013) to 55% (Crowley 1989). One trial enrolled female participants only (Lee 1998). Genderspecific proportions were not reported in five studies (Chen 2012;Daigneault 2012;Fryer 1987a;Harvey 1988;Kraizer 1991). Ethnicity data were reported in 13 studies. Two studies reported 100% Chinese participants (Chen 2012; Lee 1998). In five studies the predominant ethnicity reported was White or Caucasian comprising 74% to 97% of participants (Grendel 1991;Oldfield 1996;Poche 1988;Snyder 1986;Tutty 1997). Six studies reported diverse samples comprising participants from different combinations of White or Caucasian, Black or African, Hispanic, Asian, Middle Eastern, or 'other' backgrounds (Blumberg 1991;Daigneault 2012;Dake 2003;Dawson 1987;Harvey 1988;Hazzard 1991). In these six studies, the proportion of non-White participants ranged from 32% (Hazzard 1991) to 66% (Dake 2003). One of these studies reported country of birth rather than ethnicity (Daigneault 2012). Ethnicity data were not reported in the 10 remaining studies (Del Campo Sanchez 2006;Fryer 1987a;Hébert 2001;Kolko 1989;Krahé 2009;Kraizer 1991;Saslawsky 1986;Wolfe 1986;Wurtele 1986; e en-Ero ul 2013). Parental socioeconomic position was not reported in any study. Non-empirical markers for study locations were used such as "low socioeconomic" (e.g. Daigneault 2012), "middle income" (Grendel 1991;Hébert 2001;Poche 1988), or "lower to middle income" (Saslawsky 1986; Wolfe 1986; Wurtele 1986). Religious background of study participants was not reported in any study. One study reported data collection in religious schools in Spain (Del Campo Sanchez 2006). Participants' school achievement data (e.g. grades) at baseline were not reported in any study. In one study, the Peabody Picture Vocabulary Test (PPVT) (Dunn 1981) was used to assess children's receptive and expressive language ability at baseline (Fryer 1987a), and, in another study, Raven's Standard Progressive Matrices (RSPM) (Raven 1960) was used as a measure of general intellectual ability at baseline (Lee 1998); in this study, participants were adolescent Chinese females with mild intellectual disabilities from four special schools in Hong Kong, China. None of the studies enrolled participants on the basis of previously reported abuse.
No programmes were delivered electronically in web-or computerbased formats. The duration of the intervention programmes in the included trials ranged from a single 45-minute session (Oldfield 1996) to eight 20-minute sessions on consecutive days (Fryer 1987a). Fourteen interventions were brief (i.e. less than 90 minutes total duration) (Blumberg 1991;Crowley 1989;Dawson 1987;Grendel 1991;Harvey 1988;Hébert 2001;Kolko 1989;Krahé 2009;Lee 1998;Oldfield 1996;Poche 1988;Saslawsky 1986;Wolfe 1986;Wurtele 1986), and the remainder were longer, lasting from 90 to 180 minutes in total duration. In 17 trials, the effectiveness of prevention programmes was compared to that of a wait-listed control group. In the seven remaining studies, the control group interventions were as follows: discussion about self concept (Saslawsky 1986; Wurtele 1986); multimedia presentation with no child abuse content (Harvey 1988); fire safety (Blumberg 1991); fire or water safety (Hazzard 1991); attention control programme (Lee 1998); and a game of hangman (Snyder 1986). All programmes were delivered on school premises and during school hours, apart from one study in which the programme was delivered in the morning, before school classes began (Chen 2012).

Outcomes
In this section we summarise six outcome measures of interest that were addressed in the included studies: (i) protective behaviours; (ii) knowledge (questionnaire-based knowledge and vignette-based knowledge); (iii) retention of protective behaviours over time; (iv) retention of knowledge over time; (v) harm (manifest as parent or child anxiety or fear); and (vi) disclosures. This information is presented in the Characteristics of included studies tables.

Protective behaviours
Three studies measured change in behaviour using a simulated abuse situation and scored the child's response to the situation (Fryer 1987a;Kraizer 1991;Poche 1988). All three studies used a version of a stranger simulation test to assess children's self protective skills (i.e. whether children could follow the rules they were taught and not interact if approached by a stranger).

Knowledge
Knowledge outcome measures varied between studies. Knowledge measures used were: (i) questionnaire-based measures, or (ii) vignette-based measures that used scenarios or visual prompts to elicit a response from the child about safe behaviour in that situation. Only one study did not measure knowledge (Poche 1988), and one study used a vignette-based measure only (Krahé 2009). Ten studies used both vignette-and questionnaire-based measures (Blumberg 1991;Chen 2012;Daigneault 2012;Grendel 1991;Harvey 1988;Hazzard 1991;Hébert 2001;Lee 1998;Saslawsky 1986;Wurtele 1986). Three studies used a second questionnairebased measure to establish construct validity (Chen 2012;Crowley 1989;Del Campo Sanchez 2006). The use of more than one measure by studies to assess knowledge gain was not anticipated at the outset of this systematic review. The two types of measures were administered differently. Questionnaire-based measures were administered as self completed measures via individual or group administration. Vignette measures were administered by interview. The different methods of administration and the type of response required from the child means that these two outcomes may measure different aspects of children's knowledge; therefore, we considered them as separate knowledge outcomes.

Knowledge -vignette-based measures
Vignette-based knowledge measures were used in 11 studies. The What If Situations Test (WIST), comprising six brief verbal vignettes, was used in four studies (Grendel 1991;Lee 1998;Saslawsky 1986;Wurtele 1986). A Chinese version of the WIST was used in one study (Chen 2012), and a French version in another (Daigneault 2012). The Touch Discrimination Task (TDT), based on the WIST and comprising seven verbal vignettes, was used in one study (Blumberg 1991), and an unnamed measure comprising 10 picture vignettes featuring good touch and sexually abusive touch were used in another study (Harvey 1988). Eight cartoon picture vignettes and stories were used in Krahé 2009. Video vignettes entitled What Would You Do? (WWYD) and comprising six 30-second scenes were used by Hazzard 1991, and an unnamed video measure with five situations was used by Hébert 2001.

Retention of protective behaviours over time
Retention of self protective skills was measured in three studies at one month (Poche 1988), and six months (Fryer 1987a;Kraizer 1991). In Fryer 1987a, no comparison with the control group was available at follow-up because the control groups had been exposed to the intervention. In Kraizer 1991, data were not reported. In Poche 1988, there was substantial loss to follow-up. All three studies measured post-test protective behaviours within one to two days following the intervention. One study reported following up with assessment of protective behaviours one month after the intervention (Poche 1988), and the two other studies reported following up six months after the intervention (Fryer 1987a;Kraizer 1991). However, follow-up data were published only for Fryer 1987a; data were not published for Kraizer 1991, and Poche 1988 reported significant loss to follow-up with only nine of 23 children available for measurement.

Retention of knowledge over time
All of the 21 studies measuring post-test questionnaire-based knowledge did so within a two-week period following intervention. Ten studies also reported short-term knowledge outcomes one to three months following intervention (Crowley 1989;Dawson 1987;Harvey 1988;Hazzard 1991;Hébert 2001;Lee 1998;Poche 1988;Saslawsky 1986;Wurtele 1986; e en-Ero ul 2013). One study reported knowledge outcomes at five months (Blumberg 1991), three studies at six months (Fryer 1987a;Kolko 1989;Kraizer 1991), and two studies at eight months (Del Campo Sanchez 2006; Krahé 2009). One study measured long-term outcomes at 12 months (Hazzard 1991). One study measured longterm outcomes in "the second year of the study" (Daigneault 2012, p 527), however the precise timing was not reported. For most studies, no comparison with the control group was available at follow-up because the control groups had been exposed to the intervention by then. Complete data (for intervention and control groups) were reported in only four studies (Dawson 1987;Hazzard 1991;Kolko 1989;Lee 1998).

Disclosures
Children's disclosures of child sexual abuse during or following intervention were reported by five studies (Blumberg 1991;Del Campo Sanchez 2006;Hazzard 1991;Kolko 1989;Oldfield 1996). To record disclosures, two studies used a data collection form completed by staff at the school (Hazzard 1991;Oldfield 1996). Two other studies conducted child protective services (CPS) file searches (Blumberg 1991;Kolko 1989). Blumberg 1991 conducted follow-up CPS searches at 15 months post-intervention.

Excluded studies
We excluded 55 studies because they did not meet the inclusion criteria. We excluded 36 studies on the basis of study type (13 pretest and post-test studies without control groups; 11 controlled before-and-after studies without random assignment; five posttest only studies; five quasi-experimental studies without random assignment; one cross-sectional comparative study; and one comparative group design). We excluded 14 studies because the intervention was not primarily about child sexual abuse prevention, but was about dating and relationship violence, gendered violence, or sexual harassment in the context of partner relationships (seven of these studies were cited in the Cochrane Review by Fellmeth 2013, including Pacifici 2001, which was included in the original review) or abduction prevention, the aims of which did not mention prevention of child sexual abuse. We excluded four studies because they were not school-based and one study because participants were outside the age criteria. Reasons for exclusion are detailed in the Characteristics of excluded studies table.

Random sequence generation
Twenty studies stated that individuals or groups (classes, schools, or districts) were "randomised", "randomly allocated", or "randomly assigned" to groups, but provided no detail about how the random sequence was generated. Three further studies described a classic experimental design, but did not report details about random assignment (Dake 2003;Kolko 1989;Kraizer 1991). We classified all of these studies as unclear risk of bias. One study reported a random component in the sequence generation, coin tossing (Snyder 1986), and we classified it as low risk of bias. In one study, evidence of computerised randomisation was provided after author contact (Dake 2003). We re-classified this study as low risk of bias.

Allocation concealment
No studies provided information on methods used to conceal allocation. In all instances we concluded that procedures were potentially unconcealed such that assignment to groups could reasonably have been predicted prior to or during the process. Twelve studies reported tests of baseline imbalances showing no statistical differences between groups, potentially indicating successful randomisation. However, we classified these studies as unclear risk of bias because the method of concealment was not described in sufficient detail for an adequate assessment to be made. Ten studies provided no baseline comparisons and we also classified them as unclear risk of bias. We classified two studies as high risk of bias: one study reported important differences between groups at baseline and concluded failure of randomisation (Crowley 1989, pp 60-1) and another study revealed school officials were involved in the process (Kraizer 1991, p 27).

Blinding of participants and personnel
The school-based nature of the interventions made blinding of participants receiving the intervention and personnel delivering the intervention impossible. In 14 studies intervention and control groups were located within the same school. In these cases, it was possible that participants experienced 'contamination' effects via contact with each other in the playground or their siblings at home, and/or inadvertent 'exposure' to programme concepts via teachers and other school staff. This is likely to have biased the results towards an underestimation of programme effects, particularly on knowledge outcomes, which would be more susceptible to such contamination and exposure. Personnel delivering the interventions were various study authors, programme facilitators, and classroom teachers. None of these 14 studies described a means by which programme fidelity or integrity was addressed (e.g. via the use of scripts or standardised lesson plans) or measured (e.g. via observation, audio, or video recordings). We classified these 14 studies as high risk of bias. Seven further studies provided no information on blinding procedures and we classified them as unclear risk of bias (Chen 2012;Dake 2003;Del Campo Sanchez 2006;Saslawsky 1986;Wolfe 1986;Wurtele 1986;e en-Ero ul 2013). We classified three studies as low risk of bias: one study reported that instructors were blind to group conditions (Daigneault 2012), one study reported measures to control for contamination and the use of narrative scripts (Lee 1998), and another study reported that the programme and testing were conducted on the same day to minimise the risk of contamination between groups in the school (Snyder 1986).

Blinding of outcome assessment
Blinding was not reported in seven studies (Del Campo Sanchez 2006;Harvey 1988;Kolko 1989;Lee 1998;Tutty 1997;Wolfe 1986; e en-Ero ul 2013), which we classified as unclear risk of bias. We classified 10 studies as low risk of bias (Blumberg 1991;Daigneault 2012;Fryer 1987a;Grendel 1991;Krahé 2009;Kraizer 1991;Oldfield 1996;Poche 1988;Saslawsky 1986;Wurtele 1986). Some studies used multiple strategies for minimising outcome assessment bias. In eight studies, authors reported that outcome assessors were blind to group membership, study hypotheses, or both (Blumberg 1991;Daigneault 2012;Fryer 1987a;Grendel 1991;Krahé 2009;Oldfield 1996;Saslawsky 1986;Wurtele 1986). In three studies, authors noted that participants were not informed that the outcome assessment was related to the intervention (Blumberg 1991;Fryer 1987a;Poche 1988), and in three studies outcome assessors were reported to be different to the personnel delivering the interventions (Blumberg 1991;Fryer 1987a;Kraizer 1991). In two studies, video monitoring was used to collect observational data on the protective behaviours outcome, and coders' inter-rater reliability was reported (Fryer 1987a;Kraizer 1991). One study reported that participants were assessed only once (either pre-test or post-test) by the same outcome assessor to control for potential effects of rapport building (Blumberg 1991). Of these 10 studies, Fryer 1987a implemented more strategies than any other study and we considered it to be at lowest risk of bias in this domain. We classified seven studies as high risk of bias. In these studies outcome assessment was administered in group format (in class or with a number of children) and there were no strategies in place to blind outcome assessors to group membership or to ensure children completed the assessment independently (Chen 2012;Crowley 1989;Dake 2003;Dawson 1987;Hazzard 1991;Hébert 2001;Snyder 1986). This risk was further heightened when the outcome assessors were the same individuals as those delivering the programme (e.g. Dawson 1987).

Selective reporting
Most studies reported complete outcome data that matched the stated aims or hypothesis of the study, and reported on pre-specified outcomes of interest. We initially classified these studies as low risk of bias. We classified two studies as high risk of bias (Fryer 1987a; Wolfe 1986), because not all measures discussed in the methods section of the paper were also reported in the results. This may be an artefact of publication word limits. On closer inspection, however, we noted that outcome reporting was incomplete in five studies. One study did not provide a breakdown of data for intervention and control groups (Kraizer 1991). In four studies, outcomes were reported as summary statistics (e.g. F-tests or T-tests) without including means and SDs for continuous outcomes (Del Campo Sanchez 2006;Chen 2012, Harvey 1988Kraizer 1991). Where data were not reported, we contacted study authors with an open-ended request to provide further information. We received helpful replies from Chen 2012 (additional data provided; study classified as low risk of bias) and Kraizer 1991 (data unable to be retrieved; study classified as high risk of bias). We classified no studies as unclear risk of bias. In summary we considered five studies as high risk of bias on this domain (Del Campo Sanchez 2006;Fryer 1987a;Harvey 1988;Kraizer 1991;Wolfe 1986), and we considered the remaining 19 studies low risk of bias.

Other potential sources of bias
The unit of randomisation in 14 studies was clusters. Eleven of these were cluster-RCTs (Blumberg 1991;Dake 2003;Dawson 1987;Grendel 1991;Hazzard 1991;Kolko 1989;Krahé 2009;Kraizer 1991;Oldfield 1996;Poche 1988;Wolfe 1986), where the unit of allocation was a group (e.g. classroom or school). Three quasi-RCTs also used groups as the unit of randomisation (Crowley 1989;Daigneault 2012;Hébert 2001). None of these studies reported appropriate analyses accounting for clustering effects. Therefore, we assumed unit of analysis errors in all cases, meaning the original P values would be artificially small. In the subsequent meta-analysis, studies with unadjusted unit of analysis errors would be incorrectly and more highly weighted than is, in reality, appropriate. This risks biasing results in favour of the intervention. As noted above, to diminish the risk of publication bias, in the review update we made concerted efforts to retrieve the full texts of unpublished trials (e.g. theses). Seven of 29 records included in this review were unpublished theses (Blumberg 1987;Chadwick 1989;Crowley 1989;Dawson 1987;Grendel 1991;Kraizer 1991;Snyder 1986). We assessed the risk of publication bias by drawing fixed-effect forest and funnel plots for the two meta-analyses involving 10 or more trials (questionnaire-based knowledge, 18 trials; vignette-based knowledge, 11 trials). Visual inspection of fixed-effect forest plots revealed no discernable trend towards greater effect sizes in smaller studies. However, our subjective impression of the fixed-effect funnel plots suggested the presence of slight asymmetry on the lower right (here we found smaller studies with greater effect sizes) indicating the possibility that some studies are missing from the lower left (here we should have found smaller studies with smaller effect sizes) (see Figure 4 and Figure  5). There is also the possibility that smaller studies were of poorer methodological quality (although this is not evident in the 'Risk of bias' assessments), or there may have been genuine differences between studies (e.g. unreported sample differences at baseline; differences in programme duration) (Borenstein 2009). Due to poor reporting of variables that may be responsible for heterogeneity, it was not possible to further explore the sources of variation, for example, via the use of meta-regression.

Effects of interventions
See: Summary of findings for the main comparison This review sought to assess the evidence of effectiveness of schoolbased education programmes for the prevention of child sexual abuse. Specifically, we sought to assess whether: programmes were effective in improving students' protective behaviours and knowledge about sexual abuse prevention; behaviours and skills were retained over time; and programme participation resulted in disclosures of sexual abuse, produced harm, or both. In this section, we present the main findings on the effects of the interventions for six outcomes: (i) protective behaviours; (ii) knowledge (questionnaire-based knowledge and vignette-based knowledge); (iii) retention of protective behaviours over time; (iv) retention of knowledge over time; (v) harm (parental or child anxiety or fear); and (vi) disclosures. The analysis results and our GRADE ratings are presented in Summary of findings for the main comparison.

Protective behaviours
Of the 24 included studies, three studies reported collecting data on protective behaviours (Fryer 1987a;Kraizer 1991;Poche 1988). All used a version of a stranger simulation test involving staging of a simulated abuse or grooming situation with each individual child where a research assistant, posing as a stranger, requested the child's help with a task that required them to go with the stranger (e.g. accompany the stranger to the stranger's car to do a special task). Children's responses were recorded by independent assessors using contemporaneous video monitoring (Fryer 1987a;Kraizer 1991), or by the research assistant (Poche 1988). Scoring was pass or fail. All three studies were conducted with children in lower primary school (kindergarten to grade three). Only the Fryer 1987a (n = 48; randomised controlled trial (RCT)) and Poche 1988 (n = 74; cluster-RCT) studies could be included in the meta-analysis for protective behaviours, as Kraizer 1991 (n = 670; cluster-RCT) did not report a breakdown of pass or fail scores for intervention and control groups. For the Poche 1988 study, we combined two intervention groups as the self protective knowledge and skills received were considered sufficiently similar to those in Fryer 1987a: teaching rules, group discussion, and practice through role-play and rehearsal. Data were available for 102 participants. Comparison was with a control group. In the analysis, heterogeneity approached the moderate range (I² = 27%; Tau² = 0.16) and was non-significant (P value = 0.24). Protective behaviours were greatly enhanced in intervention groups compared to control groups immediately post-intervention (odds ratio (OR) 5.71, 95% confidence interval (CI) 1.98 to 16.51; two studies; n = 102) (see Analysis 1.1). We performed sensitivity analyses to assess the effects of adjusting the Poche 1988 study for cluster-randomisation. Using this method and an intraclass correlation coefficient (ICC) of 0.1 produced an OR of 5.43 (95% CI 1.88 to 15.65; Analysis 1.2) and an ICC of 0.2 produced an OR of 5.16 (95% CI 1.81 to 14.70; Analysis 1.3). These analyses indicate that adjusting for the effect of clustering have minimal effects on our results. Taken together, results of the more conservative adjustment for clustering show the short-term (i.e. immediately post-intervention) superiority of the interventions over control group effects. That is, children who received a school-based sexual abuse prevention programme were substantially more likely to demonstrate protective behaviours in a simulated situation that was administered immediately after the programme ended. In addition to the above assessment, Fryer 1987a and Kraizer 1991 assessed the impact of knowledge and self esteem on the use of protective behaviours. Fryer 1987a used the Harter Perceived Competence Scale for Children (HPCS) (Harter 1982), commonly used as a measure of self esteem. Kraizer 1991 used the Battle Culture Free Self-esteem Inventory (Battle 1981) and the Children Need to Know Knowledge/Attitude Test (CNKKAT) (Kraizer 1981). Results of these measures were reported only for the intervention groups. In both studies, children with high self esteem who had improved knowledge scores post-intervention were more likely to exhibit protective behaviours. These studies did not report effect sizes to enable assessment of the magnitude of the relationships between self esteem, knowledge, and protective behaviours, although self esteem was identified as a potential "critical path" or moderating variable, which was recommended for further research (Fryer 1987a, p 177).

Questionnaire-based knowledge
Of the 24 included studies, 21 reported questionnaire-based knowledge using a range of different measures detailed above. Three of the 21 studies did not provide data in a way that could be included in meta-analysis (Del Campo Sanchez 2006;Harvey 1988;Kraizer 1991). In three trials, with multiple intervention groups in which interventions were judged to be sufficiently comparable, we combined intervention groups into a single intervention group in the meta-analysis (Blumberg 1991;Crowley 1989;Dawson 1987). Eighteen studies were included in the meta-analysis comprising a total of 4657 participants. In the meta-analysis, there was evidence of substantial heterogeneity (I² = 84%; Tau² = 0.10). The high Chi² statistic (104.76; df = 17) and low P value (< 0.00001) indicated variation of effect estimates beyond chance. The SMD was 0.61 (95% CI 0.45 to 0.78), reflecting an average 0.61 standard deviation (SD) increase in factual knowledge, across various measures, for the intervention group. These results suggest that children exposed to the interventions tend to display increased factual knowledge about sexual abuse and its prevention, when measured immediately after completion of the programme, and the effect is of a moderate size (see Analysis 2.1). Of the 18 studies included in this meta-analysis, 12 were clusterrandomised studies and all were analysed with unit of analysis errors. Of the cluster-randomised studies, one was randomised by school district (Kolko 1989), four were randomised by school (Daigneault 2012;Dake 2003;Hazzard 1991;Hébert 2001), and seven by classroom (Blumberg 1991;Crowley 1989;Dawson 1987;Grendel 1991;Oldfield 1996;Snyder 1986;Wolfe 1986). We estimated ICCs, as described above, in sensitivity analyses to adjust for unit of analysis errors. We applied the same ICC to district, school, and class cluster-RCTs. When adjusted, an ICC of 0.1 produced a SMD of 0.66 (95% CI 0.51 to 0.81; Analysis 2.2) and an ICC of 0.2 produced a SMD of 0.63 (95% CI 0.50 to 0.77; Analysis 2.3). These analyses indicate that adjusting for clustering has very minimal effects on results. We also conducted sensitivity analyses to assess the effects of study exclusion for risk of bias in the two most relevant domains for school-based studies. First, we examined risk of bias on the blinding of outcome assessment domain. When studies at high risk of bias were excluded (Chen 2012;Crowley 1989;Dake 2003;Dawson 1987;Hazzard 1991;Hébert 2001;Snyder 1986), the SMD was reduced to 0.47 (95% CI 0.29 to 0.66). These results indicate that knowledge scores in these studies may be influenced by assessor bias or contamination from group assessment, or both, such that better controlled studies may generate lower effect sizes in this domain. Second, we examined risk of bias on the attrition bias domain. When studies at high risk of bias were excluded (Blumberg 1991;Crowley 1989;Daigneault 2012;Dake 2003;Grendel 1991;Kolko 1989), the SMD was 0.69 (95% CI 0.59 to 0.88), indicating that children from studies with better follow-up tended to score somewhat higher in this domain. We conducted subgroup analyses to assess the effects of participant age. We examined studies in two age-based subgroups as follows: (i) six studies with only younger participants from kindergarten to grade three (Blumberg 1991;Daigneault 2012;Dake 2003;Grendel 1991;Hébert 2001;Kolko 1989); and (ii) seven studies with only older participants from grade four upwards (Crowley 1989;Dawson 1987;Hazzard 1991;Lee 1998;Snyder 1986;Wolfe 1986; e en-Ero ul 2013). The SMD was 0.42 (95% CI 0.08 to 0.77) for the younger group and 0.89 (95% CI 0.59 to 1.19) for the older group. The test for subgroup differences was just below the statistically significant cut-off of 0.05 (Chi² = 4.04, df = 1; P value = 0.04). These results indicate that knowledge may be better gained immediately after the intervention by older children.

Vignette-based knowledge
Twelve studies used vignette-based measures in various formats, including verbal, picture, and video vignettes. One study did not report SDs and thus could not be included in a meta-analysis (Harvey 1988). One study did not report SDs but these could be derived by review authors from other reported statistics to enable inclusion in meta-analysis (Saslawsky 1986). In Blumberg 1991 and Krahé 2009, we combined two intervention groups into a single intervention group based on our assessment that the interventions were sufficiently similar when compared with other studies. Eleven studies were included in the meta-analysis with a total of 1688 participants. There was evidence of substantial heterogeneity (I² = 71%; Tau² = 0.08) in the meta-analysis. The high Chi² statistic (34.25, df = 10) and low P value (< 0.0002) provide further evidence of variation in effect estimates beyond chance. The SMD was 0.45 (95% CI 0.24 to 0.65) (see Analysis 2.4), indicating that those receiving treatment had an average 0.45 SD increase in applied knowledge as reflected in their responses to vignettes administered post-intervention, a gain of moderate effect size. Of the 11 studies included in the meta-analysis, seven studies were of cluster-randomised design (Blumberg 1991;Daigneault 2012;Grendel 1991;Hazzard 1991;Hébert 2001;Kolko 1989;Krahé 2009). To assess the impact of unit of analysis errors, we conducted sensitivity analyses for estimated ICCs (as above). For an ICC of 0.1, the SMD was 0.53, (95% CI 0.32 to 0.74; Analysis 2.5) and for an ICC of 0.2, the SMD was 0.60 (95% CI 0.31 to 0.89; Analysis 2.6). These analyses suggest that adjusting for clustering has only slight effects on results. We conducted sensitivity analyses to assess the effects of study exclusion for risk of bias. First, we examined risk of bias on the blinding of outcome assessment domain. When we excluded three studies (Chen 2012;Hazzard 1991;Hébert 2001), the SMD was reduced to 0.36 (95% CI 0.17 to 0.56), indicating a slight testing effect. Second, we examined risk of bias on the attrition bias domain. When we excluded studies at high risk of bias (Blumberg 1991;Daigneault 2012;Grendel 1991;Kolko 1989), the SMD increased to 0.57 (95% CI 0.25 to 0.89), indicating that children from studies with better follow-up tended to score somewhat higher in this domain. We conducted subgroup analyses to assess the effects of participant age. We examined studies in two groups: (i) six studies including only participants in kindergarten to grade three (Blumberg 1991;Daigneault 2012;Grendel 1991;Hébert 2001;Kolko 1989;Krahé 2009); and (ii) three studies including only participants in grade four upwards (Chen 2012; Hazzard 1991; Lee 1998). The SMD was 0.39 (95% CI 0.09 to 0.69) for the younger group and 0.56 (95% CI 0.03 to 1.08) for the older group. Thus, older children, on average, may score somewhat better than younger children when they complete these measures of applied knowledge immediately after the intervention. However, the test for subgroup differences was not significant (Chi² = 0.29, df = 1; P value = 0.59).

Retention of protective behaviours over time
Three of the 24 included studies measured retention of protective behaviours over time. Complete data were not available for any of these studies and a meta-analysis could not be conducted.

Retention of knowledge over time
Questionnaire-based measures were used in 21 of the 24 included studies. Ten of these studies reported on retention of knowledge over time. Complete data were available for four studies (956 participants) (Dawson 1987;Hazzard 1991;Kolko 1989;Lee 1998). All studies used unique knowledge scales. In three studies, followup periods were one to three months post-intervention (Dawson 1987;Hazzard 1991;Lee 1998), and in one study, six months post-intervention (Kolko 1989). These four studies were included in meta-analysis using a random-effects model. For comparative purposes we generated two meta-analyses: one estimating effects for the four studies immediately post-intervention and one estimating effects at follow-up. Results suggest that knowledge appeared to deteriorate slightly over time as demonstrated by a decline in the SMD from 0.78 (95% CI 0.38 to 1.17; I² = 84%, Tau² = 0.13, P value = 0.0003) immediately post-intervention to SMD 0.69 (95% CI 0.51 to 0.87; I² = 25%; Tau² = 0.01, P value = 0.26) at one to three months follow-up (see Analysis 3.1). However, the test for subgroup differences was not significant (Chi² = 0.14, df = 1; P value = 0.71), suggesting knowledge scores did not deteriorate significantly for intervention or control groups within the one-to six-month follow-up period. Of the four studies included in this meta-analysis, three were cluster-randomised studies (Dawson 1987;Hazzard 1991;Kolko 1989). Sensitivity analyses, adjusting for clustering yielded very similar results. When adjusted with an ICC of 0.1, knowledge decreased slightly over time as demonstrated by a small decline in the SMD from 0.86 (95% CI 0.53 to 1.20) immediately post-intervention to 0.73 (95% CI 0.41 to 1.06) at follow-up (Analysis 3.2). When adjusted with an ICC of 0.2, knowledge decreased slightly over time as demonstrated by a small decline in the SMD from SMD 0.86 (95% CI 0.53 to 1.20) immediately post-intervention to 0.72 (95% CI 0.32 to 1.11) at follow-up (Analysis 3.3). Vignette-based measures were used in 12 of the 24 included studies. Nine of these studies reported on retention of knowledge over time. None of these studies could be included in a meta-analysis. The reasons for this are twofold: (i) the wait-list control design of the study meant that the control group received the intervention immediately after the experimental group had finished and, therefore, follow-up data were unavailable for the control group (Blumberg 1991;Daigneault 2012;Grendel 1991;Hazzard 1991;Saslawsky 1986;Wurtele 1986); or (ii) the study did not provide data in a form useable in meta-analysis, for example, the study provided a narrative statement or reported summary statistics without providing means and SDs (Hébert 2001;Krahé 2009;Lee 1998). As a narrative synthesis, six studies provided intervention group data only: two studies reported no knowledge gains between post-test and follow-up (at five months, Blumberg 1991; at one year, Hazzard 1991), two studies reported maintenance of knowledge gains at two-month follow-up (Hébert 2001;Lee 1998), and three studies reported small, but unimportant additional knowledge gains between post-test and follow-up (six months, Kolko 1989; three months, Saslawsky 1986; Wurtele 1986).

Harm
A total of six studies had measured harm, but three did not report data in a form that could be used in meta-analysis (Daigneault 2012;Hazzard 1991;Kraizer 1991). We included three studies (795 participants) in the meta-analysis for harm in relation to participation in school-based child sexual abuse prevention programmes (Blumberg 1991;Dawson 1987;Lee 1998). In these studies, harm was measured via child self report anxiety or fear scales, with all studies using unique measures: Dawson 1987 used the State-Trait Anxiety Inventory for Children (STAIC), Lee 1998 used the Fear Assessment Thermometer Scale (FATS), and Blumberg 1991 used a custom-made scale. There was no heterogeneity (I² = 0%, P value = 0.79). The SMD was -0.08 (95% CI -0.22 to 0.07) (see Analysis 4.1). This result reveals evidence of no increases or decreases in anxiety or fear in intervention participants. Two of these three studies were cluster-randomised studies ( Blumberg 1991;Dawson 1987). To assess the impact of unit of analysis errors, we conducted sensitivity analyses for estimated ICCs as above, showing little change in point estimates and slightly widening CIs. For an ICC of 0.1, the SMD was -0.04 (95% CI -0.42 to 0.33; Analysis 4.2) and for an ICC of 0.2, the SMD was -0.03 (95% CI -0.46 to 0.40; Analysis 4.3). A narrative synthesis of the studies not included in the meta-analysis shows that seven studies reported on adverse effects with either child (Hazzard 1991;Kraizer 1991;Oldfield 1996) or parent self reports (Del Campo Sanchez 2006;Hazzard 1991;Hébert 2001;Tutty 1997). Using child self report measures, Hazzard 1991 and Oldfield 1996 reported no important differences in STAIC scores between intervention and control groups (Hazzard 1991, treatment mean 29.7, control mean 29.9; Oldfield 1996, F(1, 593) = 0.05, P value = 0.825). Hazzard 1991 did not report SDs and ANCOVA results. Oldfield 1996 did not report means and SDs. Oldfield 1996 also found no important differences between experimental and control group anxiety scores using the Revised Children's Manifest Anxiety Scale (RCMAS) with younger participants, F(1, 653) = 1.40, P value = 0.248. In one study (Kraizer 1991), children in the intervention group participated in an exit interview (n = 332): 14.8% of the children experienced some anxiety or fear initially but none on programme completion, and 4.5% experienced some anxiety or fear initially and remained a little worried on programme completion. Using parent self report measures of perceived changes in chil-dren's behaviour, Del Campo Sanchez 2006 (n = 193) reported the following in children exposed to the intervention: fear of adults (1%) and increased fighting with peers (1%), but no sleep problems, or rejection of normal affection. Similarly, in intervention group children, Tutty 1997 (n = 231) found worry about scary things happening (1.7%), but no bedwetting, nightmares, crying, rejection of normal affection, or attention seeking behaviour. Hébert 2001 (n = 133) reported intervention group children having increased dependency behaviours (13%), more aggressiveness towards peers (15%) and siblings (29%), and more fearfulness of strangers (25%). Hazzard 1991 (n = 399) reported no important differences between intervention and control group children on parental perceptions of anxiety or fear (summary data not provided).

Disclosure
We included three studies (1788 participants) in the meta-analysis for disclosures of previous or current sexual abuse (Del Campo Sanchez 2006;Kolko 1989;Oldfield 1996). There was no heterogeneity (I² = 0%, P value = 0.84). Disclosure occurred more often in the intervention group (OR 3.56, 95% CI 1.13 to 11.24). The odds of disclosure were as much as 3.5 times higher in participants exposed to the intervention (see Analysis 5.1). We performed sensitivity analyses to assess the effects of adjusting the Kolko 1989 and Oldfield 1996 studies for cluster-randomisation. Using this method and an ICC of 0.1 produced a nonsignificant OR of 3.04, 95% CI 0.75 to 12.33; Analysis 5.2) and an ICC of 0.2 produced an OR of 2.95, 95% CI 0.69 to 12.61; see Analysis 5.3). These analyses, adjusted for unit of analysis errors, indicate that the effect of intervention programmes on disclosure was sensitive to different assumptions regarding the effect of clustering on the results. Of the studies not included in meta-analysis, disclosure of past or current abuse was recorded in two studies (Blumberg 1991;Hazzard 1991). One study conducted a search of the files of Child Protective Services (CPS) for names of children in the classrooms who were part of the study (Blumberg 1991). Data event counts were not provided, however the study reported that risk ratios (RR) were calculated for experimental against control conditions. Both ratios "approached 1.0 which one would expect by chance" (Chadwick 1989, p 61). One further study measured disclosures, but was unable to distinguish between treatment and control groups due to data reporting methods (Hazzard 1991). Eight of 526 participants (1.5%) reported ongoing sexual abuse and 20 (3.8%) reported past sexual abuse.

Subgroup analyses
Subgroup analyses are used to compare the mean effect for different subgroups of studies where there are sufficient numbers of studies to allow for meaningful comparisons. We were able to conduct subgroup analyses for age, but only for knowledge outcomes, by categorising studies into two broad groups: younger children and older children as described above. This was because programmes were often delivered to children across multiple consecutive and non-consecutive school grades. We did not conduct other subgroup analyses in this review because the included studies provided insufficient information about issues that were hypothesised as being relevant for subgroup analysis. In the original study protocol we planned to conduct subgroup analyses for participant age and gender, and programme type and setting (Zwi 2003). We were unable to conduct subgroup analyses for gender owing to poor reporting. We did not conduct subgroup analyses for active or passive involvement as it was not possible to categorise programmes in this way; most were multifaceted, involving both active and passive approaches. What is needed is a way of identifying, more precisely, the range of child, programme, and study design characteristics that may moderate programme effectiveness. We explain this in more detail in the discussion below.

Summary of main results
This updated review reported on 24 trials (29 reports) examining the effectiveness of school-based programmes for the prevention of child sexual abuse. The studies report on data for 5802 child participants of whom 5730 (almost 98.8%) were from primary (elementary) schools. In this review, we assessed programme effectiveness according to six outcomes: (i) protective behaviours; (ii) knowledge (questionnaire-based knowledge and vignette-based knowledge); (iii) retention of protective behaviours over time; (iv) retention of knowledge over time; (v) harm manifesting as parental or child anxiety or fear; and (vi) disclosures of past or current child sexual abuse. Below we report on: (i) protective behaviours; (ii) knowledge; (iii) harm; and (iv) disclosures.

Protective behaviours
Meta-analysis of data from two studies showed significant improvements in protective behaviours in simulated at-risk situations, measured immediately (up to two weeks) post-intervention. Follow-up assessment of protective behaviours was not reported in either of the studies. Simulated situations, used in three of the included studies, were a form of in vivo assessment, which exposed children to potentially stressful situations such as an invitation to go with an unknown adult (Fryer 1987a;Kraizer 1991;Poche 1988). The use of these simulation techniques is difficult to justify and raises important ethical questions about balancing risks against potential benefits for participants. Research of this type also presents significant challenges for voluntary consent where there is active concealment via role-playing. Although this is arguably as close as researchers can get to testing whether participants' learned skills can be translated into appropriate behaviour, three salient issues must be considered. First, the generalisation of responses from simulated to actual settings cannot be assumed. Second, it is not known if skills taught in the context of approaches from strangers help children deal with threats from familiar adults, who are the most common perpetrators of child sexual abuse. Third, there is the possibility that this type of outcome assessment may desensitise children to similar occurrences in the future. Outcome assessment of this type, therefore, must be rigorously conducted and monitored. The results of one study suggest that children with greater self esteem (Fryer 1987a), as measured by the Harter Perceived Competence Scale (HPCS) (Harter 1982), exhibited better protective behaviours following intervention. Since self esteem is clinically relevant in child sexual abuse, this finding warrants further investigation to determine whether self esteem training should be included as a component of child sexual abuse prevention interventions. It may be that children with greater self esteem are more likely to display protective behaviours regardless of exposure to programmes. Unfortunately, the psychological literature has been hampered by the use of a confusing array of terms encompassing self esteem (e.g. self belief, self concept, self efficacy, self worth), and there has been extensive debate in the educational psychology literature about its role in children's learning (Valentine 2004). Greater levels of precision in definition and measurement are required in future research.

Knowledge
Meta-analysis of data from 18 studies for questionnaire-based knowledge and 11 studies for vignette-based knowledge suggested gains in factual and applied knowledge immediately (up to two weeks) post-intervention. Follow-up assessment of factual knowledge was limited to four studies with our meta-analyses showing that factual knowledge scores did not deteriorate for either intervention or control groups one to six months after interventions. Follow-up assessment of applied knowledge was conducted in some studies, however data were incomplete and not suitable for meta-analysis. Across all of the included studies, less than half of the studies (10 of 24) reported on short-term knowledge outcomes (within three months of the intervention), three studies reported medium-term outcomes (up to 12 months post-intervention), and only one study measured retention of knowledge beyond 12 months. A methodological problem in these studies was data completeness because, at the time of follow-up, control groups had already been exposed to the programmes and it is unethical to withhold programme delivery. Well-designed and timely follow-up is required to determine whether factual and applied knowledge can be sustained over time with the use of boosters and other maintenance strategies (such as reiteration of programme messages by parents and teachers). An important source of heterogeneity across studies is the knowledge measure used. For the 24 studies included in this review, 15 discrete questionnaire-based measures and six discrete vignettebased measures were used to measure children's factual and applied knowledge respectively. For studies included in the meta-analyses, there were 10 unique questionnaire-based measures and six unique vignette-based measures represented. These were pooled using the standardised mean difference (SMD) as a summary statistic. In using SMDs, we treated the different assessment measures as though they were one standardised measure with comparable standard deviations (SDs). It is then difficult to relate this abstract figure back to the original measures to determine what this means in real life. For example, it is not clear what a 0.61 SD increase in factual knowledge or a 0.45 SD increase in applied knowledge translates to in practical knowledge terms. Are these findings sufficient to offer protective effects under threats of sexual abuse? Further research is required to address the magnitude of knowledge improvement required to produce clinically important protective effects. Research would be improved by the use of standardised rather than custommade instruments.

Harm
Adverse or negative effects in the form of harm to participants were assessed via measures of child anxiety or fear. Studies examining participants' anxiety or fear were based on child self report and parent report. Meta-analysis of three studies found no evidence of increased or decreased anxiety or fear in those exposed to programmes and this did not change when adjusted for clustering. Narrative synthesis of included studies revealed that a small proportion of programme participants experienced anxiety or fear but these (anxieties or fears) were mild rather than severe, and shortrather than long-term. There was insufficient information to assess whether harms varied according to participant age or grade level. Although parent satisfaction data were collected in some studies, parental anxiety or fear was not measured in any study. This may be important in future studies for determining the role of parents in moderating programme effects.

Disclosures
The only direct measure of programme effects was participants' disclosures of past or current sexual abuse that were made following interventions. Disclosures were poorly reported or not reported in most studies. Our meta-analysis of three studies showed greater odds of disclosures by children receiving interventions. However, such disclosures cannot really be considered an adverse event since: (i) the onset of the alleged abuse would have occurred prior to the intervention; (ii) disclosing abuse, while potentially traumatic, can also prompt the provision of treatment; and (iii) the identification and reporting of perpetrators may prevent harm to other children. Details of how disclosures were dealt with were not reported in any of the studies. Appropriate systems for dealing with disclosures are important and must reflect jurisdictional legal reporting obligations (also known as mandatory reporting laws), and school policies for child maltreatment recognition, reporting, and response. Future studies should consider methods for recording and responding to disclosures; data linkage to child protection or police records, or both; and/or interviewing or surveying participants at repeated follow-up intervals.

Subgroup effects
Demographic characteristics (e.g. participant age, gender, ethnicity, socioeconomic position, and ability level) are potential sources of heterogeneity, and potential effect moderators. If studies do not account for these characteristics, important subgroup effects may be missed. Genuine but unidentified differences in study samples at baseline are potential sources of heterogeneity within and across studies. Baseline characteristics of intervention and control groups were inconsistently and poorly reported in the included studies. Control for baseline characteristics within individual studies is particularly important for criteria that are most relevant to learning such as academic ability, or reading age. These data were not reported or were absent by study design, therefore we were not able to explore whether programme effectiveness varied according to key baseline criteria. These issues have implications for programme delivery. Demographic characteristics, such as participant age, would appear to be straightforward variables, however, mean age was not reported in eight of 24 included studies and in others was conflated with grade level. Few studies were undertaken with single grades, and most (18 of 24) studies were undertaken with multiple grade levels together. This study design limited the pooling of results across studies in meta-analysis. Subgroup analyses showed that older children (grades four and above) made greater knowledge gains than younger children (grades kindergarten to three) immediately post-intervention; results that are congruent with developmental and maturation theories. However, we do not know if younger children would respond differently with differentiated approaches (e.g. reinforcement of skills and knowledge by parents or teachers, or both). We were unable to assess programme effectiveness according to other potentially important participant variables (e.g. child gender, ethnicity, socioeconomic position, and ability level) as few studies reported on these data or provided subgroup effects.

Characteristics of effective programmes
Insufficient data were provided to evaluate the specific effects of programme type, duration, frequency, or setting. These programme characteristics have implications for delivery in schools and the ideal constellation of programme characteristics, which is not yet known. Although there was insufficient information to develop programme typologies and compare effects, we noted that approximately half of the programmes in included studies used content, such as the teaching of safety rules (e.g. "my body belongs to me"), and prevention concepts (e.g. distinguishing appropriate and inappropriate touches), and the use of delivery methods such as discussion, modelling, role-play, rehearsal, and feedback. Our narrative synthesis of included studies documented multidimensionality in intervention contents, methods, and delivery. This is an important finding in itself. To date, programmes have been categorised dichotomously as active or passive or behavioural or instructional. Our descriptive analysis shows this categorisation to be somewhat artificial as most programmes in this review were multifaceted with multiple components. Programmes covered multiple topics (e.g. body safety rules, distinguishing types of touches, reporting abuse to adults who can help), used teaching strategies in combination (e.g. discussion, modelling, role-play, rehearsal, and feedback), and integrated active or passive and behavioural or instructional approaches in one session (e.g. a video or DVD presentation encouraged children to listen and then partake in activities). The contribution to effectiveness of programme content, methods, and delivery will require documentation using standardised data collection tools in future studies. The duration and frequency (dose) of programme interventions varied from one single 45-minute session to eight 20-minute sessions. There were insufficient studies to create subgroup analyses for total programme hours, or total number of sessions, or for the presence or absence of booster sessions or reinforcement strategies. While interventions appear to increase protective behaviours and knowledge about sexual abuse, it is important that this learning is not seen as a replacement for adult responsibility to ensure child safety. Nor should education replace the need for appropriate medical and legal handling of those affected by child sexual abuse. We do not have evidence that these programmes reduce the incidence of child sexual abuse. The findings of this review need to be considered in the context of complementary prevention initiatives. Current child sexual abuse prevention frameworks suggest that strategies must not only target children, but must work on multiple elements of children's social systems to prevent abuse from occurring in the first place, namely at the level of the family, community, and society (Smallbone 2008).

Overall completeness and applicability of evidence
Studies were conducted in countries with high and upper-middle income economies according to the World Bank's analytical income categories (The World Bank 2013). Most (16 of 24) were conducted in North America, the remainder in Europe, East Asia, and Central Asia. Ethnicity data were poorly or not reported in 10 of the 24 studies. Where data were reported, participants were from a diverse range of ethnicities, increasing the generalisability of the evidence, and also suggesting that concern about child sexual abuse prevention and the delivery of programmes in schools is a wide-spread phenomenon. Whether similar effects would be seen when programmes are implemented in countries not included is unknown. All but one of the included studies was conducted in primary (elementary) school settings. There are several possible reasons for this. First, policy makers and school authorities may truly recognise that the age of greatest vulnerability is within the earlier school years (7 to 12 years according to Finkelhor 1986). Second, from our searches, we gleaned that programmes for secondary (high) school students tended to be broader in scope and focused on the prevention of relationship and dating violence, sexually coercive peer relationships, sexual harassment, or sexual assault (see Fellmeth 2013). The purpose of these programmes was not predominantly prevention of child sexual abuse, the focus of this review. In our searches we noted a sizeable group of studies based in preschool settings, the effectiveness of which requires further scrutiny in a separate systematic review given that these programmes have qualitatively different delivery methods and contents, including greater parental participation, which we infer may have a mediating effect. None of the included studies investigated the effectiveness of a web-based or online programme. This may be because rigorous programme evaluations have not yet been developed, conducted, or published. Online programmes offer the potential for technology to capture real-time evaluation data from children as they experience online interventions. As noted above, the completeness and applicability of evidence was limited by methodology and failure to report the full range of child, intervention, and study design characteristics that could possibly account for variations in programme effects. In the period since the original review was conducted (Zwi 2007), Cochrane Reviews have become more rigorous in identifying methodological limitations in trials via risk of bias analyses, and the CONSORT statement has been developed to provide guidance on the reporting of randomised controlled trials (Shulz 2010). Nevertheless, the methodological quality of trials has not improved substantially. No study in this area has yet published a study protocol, and we found no clinical trials register records pertaining to studies of this type. Researchers must continue testing these interventions, but use study design methodology, data collection tools, registration, and reporting guidelines that enable rigorous scientific evaluation.

Quality of the evidence
Summary of findings for the main comparison presents the quality of evidence for each outcome of interest. We downgraded the quality of evidence to moderate quality either due to risk of bias, imprecision, or because of the impact of adjusting for the effect of clustering within some of the studies. Most studies in this review were at an unclear risk of selection bias as illustrated in Figure  1 and Figure 2, due to inadequate information regarding methods of random sequence generation and allocation concealment. Studies which randomised classes within a single school to intervention and control groups were at high risk of contamination effects owing to the interaction of children in school playgrounds, friendship groups and families, and also from chance exposure to programme concepts via teachers and other school staff familiar with programme contents. In addition, there was detection bias due to inadequate or unclear assessor, participant and personnel blinding, and inadequate or unclear reporting of attrition for assessments at post-test and follow-up. Double-blinding to minimise performance bias is seldom possible in school-based trials as group membership is obvious to participants, programme facilitators, and school staff. Blinding of staff responsible for assessing study outcomes can be controlled with careful planning and implementation. This would be particularly effective where outcome assessments are administered with children individually. However, group administration of self report questionnaires or vignette measures may be more susceptible to bias when used with younger participants who are not yet able to read independently. Alternative administration methods, including the use of digital devices and animations, may go some distance to minimising detection bias. In 14 of the included studies children were randomised in groups of classrooms, schools, or school districts for ease of implementation. However, the appropriate analysis for cluster-randomisation was not used in any of the studies resulting in potential for overestimation of the effects of interventions. Initial analyses do not take account of unit of analysis errors that occurred in at least half of the studies in each meta-analysis. ICCs used in the meta-analysis are imputed and may not be appropriate for all of the studies included. Therefore, results might have differed had the true ICCs from these studies been available, or had cluster-adjusted results been provided by the authors. Furthermore, the same ICC was used for studies that had undertaken cluster-randomisation at class, school, and district level, which could further overestimate the magnitude of the findings.

Potential biases in the review process
In producing this review our aim was to provide an unbiased appraisal of the evidence available. We have attempted, therefore, to be comprehensive in our reporting and transparent in our methodology. The review was conducted in line with criteria in the published protocol (Zwi 2003), and where we deviated from these criteria to accommodate updates in Cochrane review methods or advances in the field, we have documented this in the subsection on Differences between protocol and review. The methodological decision to produce each meta-analysis was complex, involving a balance between the quest for an easily digestible summary of the information, and the danger of applying results when significant methodological caveats exist. We present the meta-analyses with accompanying cautions as outlined above, and invite debate and comments regarding the route we have chosen.

Agreements and disagreements with other studies or reviews
Five previous meta-analyses of sexual abuse prevention programmes exist as noted in Table 1, including the original version of this review (Zwi 2007). Our review differs from previous reviews in that it assesses a broader range of outcomes, applies more rigorous inclusion criteria to select high quality studies, and excludes preschool programmes. Further, all previous reviews included studies with control groups but did not apply randomisation criteria, therefore unlike our review, previous reviews included controlled before-and-after studies. All previous reviews have found medium to large effects for knowledge outcomes in favour of intervention groups. These effect sizes ranged from 0.57 (Heidotting 1994, 18 studies), through 0.71 (Rispens 1997, 16 studies) and 0.90 (Berrick 1992, 13 studies) to 1.07 (Davis 2000, 27 studies). Our previous review found a SMD of 0.59 (95% confidence interval (CI) 0.44 to 0.74; nine studies, n = 3022) for the questionnaire-based knowledge outcome, which is the outcome most comparable to the outcomes reported in previous reviews. The current review found a SMD of 0.61 (95% CI 0.45 to 0.78; 18 studies, n = 4657). Davis 2000 attempted subgroup analyses to examine moderator effects: age (mean age was divided into three groups: three to five years, 5.1 to eight years, older than eight years of age), level of participation (participation was analysed at three different levels: physical participation, verbal participation, no participation), and number of sessions (three subsets: one session, two to three sessions, more than three sessions). Due to inadequate reporting of study data, we were unable to replicate these meta-analyses, and would caution against using the broad variable of participation as the only marker for programme variation. Given that most programmes include multiple participatory opportunities, often in combination, it may be more informative to develop and explore the effects of multidimensional programme typologies as noted above.

Implications for practice
Our overall interpretation is that there is moderate quality evidence that school-based programmes for the prevention of child sexual abuse, of the types described in this review, are effective in increasing primary (elementary) school-aged children's protective behaviours and knowledge immediately post-intervention. Knowledge scores did not deteriorate for intervention participants one to six months after programme participation, signalling that booster sessions or other maintenance strategies for reinforcement of key messages remain appropriate follow-up strategies. Retention of knowledge should be measured beyond six months. It appears that older children make greater knowledge gains than younger children when tested using questionnaire-based measures, but not when using vignette-based measures, indicating the need for caution when interpreting study findings. None of the included studies evaluated programmes delivered via electronic means. On balance of evidence, programmes do not appear to increase or decrease children's fear or anxiety, and may result in greater odds of disclosures of past or current sexual abuse from children who have been programme participants, however results are uncertain because of inappropriate data analysis in individual studies. Hence, there is a need for ongoing monitoring of both positive and negative short-and long-term effects of programmes in more rigorous studies with more detailed reporting of potential moderators of programme effects in the form of child, programme, and contextual characteristics.
Currently, schools implement a variety of interventions aimed at preventing child sexual abuse. It is likely that these interventions will be most useful as part of wider community initiatives promoting the safety of children, the contents, processes, and outcomes of which must be clearly defined and measured in rigorous evaluation designs. Furthermore, children's increased knowledge of abuse should not be seen as a replacement for society's responsibility to ensure child safety. It must be emphasised that increasing children's knowledge in this area does not mean they are in any way responsible for abuse, which might then occur by their not being able to apply this knowledge in an actual abuse situation. Even if successful in only a small proportion of situations, given the prevalence of child sexual abuse, it is possible that the skills and knowledge learned in prevention programmes may be of assistance to a considerable number of children.

Implications for research
Further evidence is required to assess the effectiveness of schoolbased programmes for the prevention of child sexual abuse. The current evidence is primarily focused on improvements in participants' skills (protective behaviours) and knowledge (both factual and applied knowledge), and to a lesser extent on assessing harm (child anxiety or fear) and disclosures of past or current child sexual abuse. Further research is needed to investigate the links (if any) between programme participation and actual prevention of child sexual abuse. This will require large cohort studies with repeated follow-up into adulthood. However, even large cohort studies may not provide definitive evidence for changes in child sexual abuse incidence, as it is under-identified and difficult to prove. Further research is also required to address uncertainties about the magnitude of skill or knowledge improvement (or both) that can (if at all) translate to clinically important protective effects. Such ev-idence is a necessary precursor to assessing programmes' cost-effectiveness.
Ongoing research is needed to more rigorously evaluate programmes. Research to date suggests several categories of factors that may influence programme effectiveness, such as child factors, including family microsystem factors; programme factors, including school contextual factors; and evaluation design factors (Heidotting 1994;Rispens 1997). These require further investigation in well-designed experimental studies. Many demographic and other independent variables were poorly reported in the included studies. Reliable evidence of this type will advance assessment of programmes' cultural sensitivity, and the appropriateness of programmes for groups of children considered at greater risk. Future evaluations must be more comprehensive, use valid, reliable, standardised measures, and be more precisely reported, according to evidence-based guidelines for reporting of clinical trials such as the CONSORT (Consolidated Standards of Reporting Trials) Statement (Shulz 2010).
Further investigation of programme contents, methods, and delivery is required with a view to developing programme typologies that can incorporate the programmes' multidimensionality.
To this end, typologies should be developed that capture variables emerging as important in newly developed frameworks for child sexual abuse prevention (Smallbone 2008), such as the extent and nature of parent, teacher, and community education components within programmes.
Future studies should address problems with study design, in particular unit of analysis errors in cluster-randomised trials. Studies where cluster-randomisation is used should adjust results with appropriate statistical methods, and publish intra-class correlation coefficients (ICCs) (Campbell 2004). It may then be possible for meta-analyses to be more robust, and to overcome inadequate sample size and study power to test for differences in child characteristics and intervention types. Other design features that warrant particular attention in future studies include those domains associated with risk of bias: randomisation of study participants, allocation concealment, blinding of outcome assessors, reporting of attrition, and analysis based on intention-to-treat (ITT). Longer follow-up periods for measurement of study outcomes beyond six months are essential to monitor maintenance effects.

A C K N O W L E D G E M E N T S
In the original review, Danielle M Wheeler acknowledged support from the Financial Markets Foundation for Children and the Nordic Campbell Centre. Dr Andrew Hayen (Australia) and Dr Roger Harbord (UK) provided much valued statistical advice.
In the review update, Kerryann Walsh was funded by a Queensland University of Technology Vice Chancellor's Research Fellow-ship (2010 to 2012) and acknowledges the Australian Research Council Discovery Projects scheme (DP1093717). Pauline Mulligan and Leisa Brandon provided much valued research assistance. The Australasian Cochrane Centre provided training and review completion workshops.
The authors are especially grateful for the comments of external Cochrane reviewers and statisticians, and for the expert advice from the Cochrane Developmental, Psychosocial and Learn-ing Problems Group editorial base: Professor Geraldine MacDonald (Co-ordinating Editor), Dr Joanne Wilson (Managing Editor), Gemma O'Loughlin (Assistant Managing Editor), and Margaret Anderson (Trials Search Co-ordinator) who conducted the searches for this review update. The authors also wish to thank Laura MacDonald, former Managing Editor, Cochrane Developmental, Psychosocial, and Learning Problems Group, for her support during development of the update of this review.     (2): 181-5.

Characteristics of included studies [ordered by study ID]
Blumberg 1991 Interventions Intervention 1: role-play programme ("Stop, Tell someone, Own your body, Protect yourself " (STOP)) • Content: body ownership/body rights; body openings needing protection (eyes, ears, private places); appropriate and inappropriate touches; safety rules (Stop, Go, Tell, tell, tell and keep telling until somebody listens); perpetrators are usually someone known to the child; sexual abuse is not the child's fault; appropriate and inappropriate secrets • Methods: role-play, modelling, rehearsal, and discussion • Delivery: by volunteers trained by a licensed social worker with expertise in child sexual abuse Intervention 2: multimedia programme ("Child Abuse Primary Prevention Program" (CAPPP)) • Content: discriminating types of touches based on feelings; they have the right to say no; safety rules "Say No," "Go," and "Tell"; no one should touch private areas unless you need help; "touching secrets" or secrets that hurt should never be kept; sexual abuse is never the child's fault • Methods: younger children were taught concepts through use of teddy bear and viewed a film; older children were taught through a puppet show and discussion • Content: body ownership; distinguishing appropriate from inappropriate touches and requests; distinguishing types of secrets; and abduction prevention training based on the book "Who Is a Stranger and What Should I Do?" (Girard 1985) • Methods: instruction; modelling, role-play, rehearsal, practice, feedback, and reinforcement • Delivery: details not reported Control: wait list control Duration: 2 x 50-minute sessions delivered "at the beginning of the school day… before children began their regular academic classes" (p 628)

Outcomes
Protective behaviours simulation: no Knowledge (questionnaire-based knowledge): Children's Sexual Knowledge Questionnaire (CSKQ), a 6-item self report knowledge questionnaire with response items correct/ incorrect/I don't know Knowledge (questionnaire-based knowledge): Children's Awareness of Scary Secrets (CASSQ), a 6-item self report measure to distinguish okay from not okay secrets. Items

Chen 2012 (Continued)
Selective reporting (reporting bias) Low risk All measures discussed in the methods section of the article were also reported in the results. However, some data were incomplete. Missing data were provided after author contact Unclear risk "Fourth and fifth grade children (n = 293) were randomly assigned to one of four groups" (p iii). Method of randomisation was not reported Allocation concealment (selection bias) High risk Method of concealment was not described.

Crowley 1989
Potentially unconcealed procedure. Tests of baseline imbalances were conducted: "successful randomisation of Groups 1 & 2 did not occur" (pp 60 -61). There were differences in pre-test mean scores for groups 1 and 2. Group 1 had higher scores on the pre-test SAKI than group 2. Group 3 had higher scores on the SAKI and PSQ than group 4 Blinding of participants and personnel (performance bias) All outcomes High risk Blinding procedures were not reported. It did not seem that the intervention groups were blinded to their own condition. Homeroom teachers were present during programme delivery, so it was not possible for them to be blinded to the students' conditions Blinding of outcome assessment (detection bias) All outcomes High risk Group administration of the outcome assessment meant that outcome assessors would need to be blinded to the condition of each entire class or homeroom. Given that the assessors were school staff, blinding was not possible. On some occasions the outcome assessor was the researcher who was not blinded to the groups. On some other occasions the outcome assessor was the programme presenter who was also not blinded to the groups Incomplete outcome data (attrition bias) All outcomes High risk Data on 74 participants (20%) were excluded due to missing or incomplete data, or absence during a portion of the study Interventions Intervention: child abuse prevention curriculum modified from an existing curriculum (title not reported) • Content: abuse problems children may encounter; people in family and community support systems that children can turn to in abuse situations; 3 types of touches; personal safety rules regarding potential child abuse; child abuse is never a child's fault; child abuse should never be kept secret; empathy for others who find themselves in abusive situations • Methods: role-play, video, discussion Blinding of outcome assessment (detection bias) All outcomes High risk Group administration of the outcome assessment meant that outcome assessors would need to be blinded to the condition of whole schools. This may not have been possible under the circumstances. The identities of the outcome assessors were not reported Incomplete outcome data (attrition bias) All outcomes High risk Attrition was reported as 24% due to "absenteeism" and "unmatchable questionnaires" (p 78) Selective reporting (reporting bias) Low risk All measures discussed in the methods section of the article were also reported in the results

Bias Authors' judgement Support for judgement
Random sequence generation (selection bias) Unclear risk "Classes in the selected schools were randomly assigned to the different treatment groups" (p 45). Method of randomisation was not reported Allocation concealment (selection bias) Unclear risk Method of concealment was not described. Potentially unconcealed procedure. Tests of baseline imbalances were conducted. Age, race, and gender ratios were not significantly different among groups. However, results showed that the mean pre-test knowledge test score for group B (control 1) was significantly higher than A (intervention) or C (control 2) on the pre-test (p 82) Blinding of participants and personnel (performance bias) All outcomes High risk Blinding procedures were not reported. Students within one school were receiving both treatment and control conditions. Authors indicate that children may have been exposed to "grapevine" effect (p 51) whereby information was transmitted informally throughout the school, or between siblings in a family or across families having contact with each other outside of school. School personnel did not appear to be blinded to group or class membership so there is risk of differential treatment of groups

Dawson 1987 (Continued)
Blinding of outcome assessment (detection bias) All outcomes High risk Classroom teachers, a guidance counsellor, and the researcher served as outcome assessors. Outcome assessors remained in the classroom during the child sexual abuse prevention presentation, therefore, it was not possible for them be blinded to the groups they were assessing. It is not clear if outcome assessment was administered individually to children, or in group format with whole classes Incomplete outcome data (attrition bias) All outcomes Low risk Attrition is noted as 7.3% intervention, 2. 6% control 1, 3.1% control 2 Incomplete data were noted as due to student absence or withdrawal from school. It is possible that there were differences between students with complete and incomplete data Selective reporting (reporting bias) Low risk All measures discussed in the methods section of the article were also reported in the results. Additional interaction effects were presented Harm: information on programme side effects was collected in a questionnaire for parents (12-item version) and educators (9-item version) asking for observations of positive and negative changes in children's behaviour after programme completion Other: qualitative assessment of children's participation in the programme during delivery. These data were collected using an observation sheet completed by educators acting as "participant observers" (p 2) Last outcome assessment: 8 months after programme completion Notes Author contact: yes The curriculum evaluated in this study is the 1st elementary school curriculum of its type developed for delivery in Spain

Bias Authors' judgement Support for judgement
Random sequence generation (selection bias) Unclear risk "Subjects were randomly assigned" (p 2). Method of randomisation was not reported

Bias Authors' judgement Support for judgement
Random sequence generation (selection bias) Unclear risk "Twenty-four each were randomly assigned to the experimental and control groups tested" (p 174). Method of randomisation was not reported Allocation concealment (selection bias) Unclear risk Method of concealment was not described.
Potentially unconcealed procedure. Tests of baseline imbalances revealed "pretest scores on each of the three tests administered were very nearly the same for the two study groups" (p 177) Blinding of participants and personnel (performance bias) All outcomes

High risk
As there was only 1 intervention group, there was no possibility for systematic differences between groups in the way in which the programme was delivered. However, as the control group were from the same school, they may have experienced some contamination or exposure to the programme via other students in the playground, or friends, or siblings outside of the study setting Blinding of outcome assessment (detection bias) All outcomes Low risk Children were blinded to the simulation test. "A research assistant, posing as a stranger" (p 175) conducted the outcome assessment. The blinding of the assessor (if any) is not reported. "A hidden camera and wireless microphone produced an audiovisual record of the encounter which was later reviewed and scored by research team members" (p 176). Interrater reliability was established as 1.0 (total reliability) Incomplete outcome data (attrition bias) All outcomes Blinding of participants and personnel (performance bias) All outcomes

High risk
The intervention groups were not blinded to their own condition and school personnel were not blinded to group or class conditions since teachers attended training and completed measures. Since both intervention and control groups were from the same school, there is a possibility of treatmentcontrol contamination effects Blinding of outcome assessment (detection bias) All outcomes Low risk Outcome assessment was conducted individually with each participant. "Every effort was made to keep the assistants naive to the hypotheses and to the group membership of the subjects" (p 72) Incomplete outcome data (attrition bias) All outcomes High risk Incomplete outcome data, mainly in the form of "missing data due to students' absence, withdrawal from school, unwillingness to participate" (p 70). This is high: 19% intervention group; 22% control group Selective reporting (reporting bias) Low risk All measures discussed in the methods section of the article were also reported in the results there was approximately an equal number of black and white boys and girls per group) to one of two groups: an experimental group and a placebo control group" (p 432). Method of randomisation was not reported Allocation concealment (selection bias) Unclear risk Method of concealment was not described. Potentially unconcealed procedure. In terms of baseline imbalances, results indicated no significant differences in the age of children, family socioeconomic status, gender, or race between experimental and control groups

Harvey 1988
Blinding of participants and personnel (performance bias) All outcomes High risk Blinding procedures were not reported. 2 "experimenters" delivered the intervention programme (p 431). "Each experimenter conducted experimental and placebo control sessions in two schools" (p 431). These individuals could not have been blinded to study conditions, however the use of 2 individuals increases the risk that compared groups received different interventions Blinding of outcome assessment (detection bias) All outcomes Unclear risk Outcome assessment was conducted "individually for each child at pre intervention, postintervention, and follow up" (pp 431-2). The measures used to blind outcome assessors from knowledge of which intervention participants received was not reported Incomplete outcome data (attrition bias) All outcomes High risk Attrition was reported only for the study overall, and not specified for intervention and control groups. Attrition and missing data were attributed to student absence during the programme or testing, and moving from the school. Attrition was calculated overall as 19/90 (21%) Selective reporting (reporting bias) High risk Not all measures discussed in the methods section of the article were also reported in the results. Means and SDs for knowledge outcomes were measured but not reported

Kolko 1989 (Continued)
Allocation concealment (selection bias) Unclear risk Method of concealment was not described. Potentially unconcealed procedure. No baseline imbalances were detected between groups. There were 6 intervention schools and only 1 control school meaning that the groups were not equivalent. Adjustment procedures to address these imbalances were not reported Blinding of participants and personnel (performance bias) All outcomes High risk Blinding procedures were not reported. However, it is likely that participants were not blinded to their condition. Blinding of key personnel within the school was not possible as they were involved in programme delivery Blinding of outcome assessment (detection bias) All outcomes

Unclear risk
It was not clear if outcome assessment was administered individually to children, or in group format with the whole class. The identities of outcome assessors were not reported. Methods of blinding were not reported Incomplete outcome data (attrition bias) All outcomes High risk Attrition was high (as noted above). Reasons for attrition were not reported Selective reporting (reporting bias) Low risk All measures discussed in the methods section were reported in the results section • Content: promoting children's skills in handling interactions with adults in which they feel uncomfortable, such as being asked to keep a secret about which they feel schools' conditions, that is, they may or may not have been aware that they were getting/not getting something equivalent to other groups (e.g. via correspondence with the Berlin Police). It is possible that students were blinded, but teachers were not. It is possible that teaching staff in the DVD group may have compensated for not having the live performance which may have altered results

Krahé 2009
Blinding of outcome assessment (detection bias) All outcomes

Low risk
Group administration of the outcome assessment meant that outcome assessors would need to be blinded to the condition of whole schools. 4 interviewers conducted the outcome assessments. "One of them was the second author, who was not blind with regard to the hypotheses and experimental conditions. Half of the sessions in each school were conducted by the second author, the remaining sessions were conducted by the three additional interviewers who were blind as to the hypotheses of the study and the group membership of the children they tested. In this way, the same number of sessions was run by the second author and the additional interviewers in each condition. No differences between the conditions were found in relation to different interviewers" (p 325 footnote 3) Incomplete outcome data (attrition bias) All outcomes conditions" (p 822). Method of randomisation was not reported Allocation concealment (selection bias) Unclear risk Method of concealment was not described.
Potentially unconcealed procedure Blinding of participants and personnel (performance bias) All outcomes High risk Blinding procedures were not reported. Students within 1 school were receiving treatment and control conditions. There was a possibility of treatment-control contamination of information transmitted informally throughout the school. School personnel did not appear to be blinded to group or class membership so there is risk of differential treatment of groups Blinding of outcome assessment (detection bias) All outcomes Low risk "Data were collected by assigned evaluators from subjects in both treatment and control groups on the same day... All data were collected in a blind assessment format with the evaluators unaware of which classrooms were assigned to treatment or control conditions" (p 824). Outcome assessments were administered in group format in classrooms Incomplete outcome data (attrition bias) All outcomes Unclear risk Missing data were not reported. Attrition not was reported Selective reporting (reporting bias) Low risk All measures discussed in the methods section of the paper were also discussed in the results Blinding of outcome assessment (detection bias) All outcomes

Poche 1988
Low risk "The adult portraying an abductor served as the primary observer and recorded each child's verbal and motor responses as soon as the simulation was over. This observer was blind to the experimental condition of each subject" (p 257). Another adult "served as a reliability observer" (p 257). Agreement between the two observers was 100% (total reliability) Incomplete outcome data (attrition bias) All outcomes Unclear risk Attrition between pre-and post-test was not reported. At 1-month follow-up, only 23/74 children (31%) met the criteria for outcome assessment (pp 256-7). Of these only 9 were available to partake (12%). Reasons for attrition were "summer vacations, disconnected phones, illnesses and accidents" (p 257) Selective reporting (reporting bias) Low risk All measures discussed in the methods section of the paper were also discussed in the results

Saslawsky 1986
Methods Design: quasi-experimental randomised Solomon 4-group design Unit of allocation: individuals Intention-to-treat analysis: no Adjustment for clustering: no Participants Total number randomised: 67 students (26 kindergarten and 1st grade students; 41 5th and 6th grade students) Mean age: 6.2 years; 11.1 years Gender: 52% male; 48% female Ethnicity: not reported Setting: 2 public schools in a lower to middle class areas in rural eastern Washington Country: USA Attrition: not reported Interventions Intervention: children viewed the 35-minute film "Touch" (Illusion Theater Company 1984) • Content: portrayal of abusive incidents with modelling of 4 prevention skills (say no; yell for help; get away; tell someone and keep telling until someone believes you) • Methods: film; followed by a 15-minute discussion about children's feelings,

Bias Authors' judgement Support for judgement
Random sequence generation (selection bias) Low risk "A coin was tossed to determine group assignments" (p 45). No other information was provided Allocation concealment (selection bias) Unclear risk Method of concealment was not described.
Potentially unconcealed procedure Blinding of participants and personnel (performance bias) All outcomes Low risk Blinding procedures were not reported. Children may not have been blinded to their condition. Blinding of key personnel (e.g. teachers) may not have been possible in the school delivery context. The programme and testing were conducted on the same day in an attempt to control for contamination effects Blinding of outcome assessment (detection bias) All outcomes High risk Group administration of outcome assessment meant that outcome assessors would have to be blinded to the condition of entire classes. This was not possible as outcome assessors were also programme presenters Incomplete outcome data (attrition bias) All outcomes Low risk Missing data were noted for 8/177 participants (4.5%) owing to parental omissions on the child data sheet. Attrition from the study is not reported Selective reporting (reporting bias) Low risk All measures discussed in the methods section were also reported in the results section Interventions Intervention: 2 x 5-minute plays written and performed by volunteer medical students who consulted with child abuse specialists • Content: 5 themes: abuse can be perpetrated by someone you love and trust; feelings generated in such circumstances; importance of telling someone, even if unsure of what is happening; abuse is not your fault; and getting help right away is the best way to respond • Methods: theatrical skits depict "a child at school who was upset about (abusive) events that had happened at home on the previous evening" (p 88); followed by 1-hour discussion • Delivery: by volunteer medical students who consulted with a child abuse specialist Control: wait-list control Duration: 1 x 70-minute session

Outcomes
Protective behaviours simulation: no Knowledge (questionnaire-based knowledge): a brief 10-item true/false questionnaire focusing on programme objectives

Bias Authors' judgement Support for judgement
Random sequence generation (selection bias) Unclear risk "The 12 classrooms participating in the study were randomly assigned to a control or treatment condition" (p 88). Method of randomisation was not reported Allocation concealment (selection bias) Unclear risk Method of concealment was not described.
Potentially unconcealed procedure Blinding of participants and personnel (performance bias) All outcomes Unclear risk Blinding procedures were not reported. It is not known whether whole schools were allocated to conditions or whether schools comprised classes allocated to both treatment and control conditions. The latter presents a higher risk of treatment-control contamination Blinding of outcome assessment (detection bias) All outcomes Unclear risk Method of assessment (group or individual administration) was not specified. The measures used to blind outcome assessors from knowledge of group membership was not reported Incomplete outcome data (attrition bias) All outcomes Unclear risk Missing data were not reported. Attrition was not reported Selective reporting (reporting bias) High risk "Three items were dropped from the final questionnaire due to their inability to contribute to the validity of the measure" (p 89) , therefore outcome data for only 7 questionnaire items are reported