A Meta-Analysis of Empirically Tested School-Based Dating Violence Prevention Programs

Teen dating violence prevention programs implemented in schools and empirically tested were subjected to meta-analysis. Eight studies met criteria for inclusion, consisting of both within and between designs. Overall, the weighted mean effect size (ES) across studies was significant, ES r = .11; 95% confidence interval (CI) = [.08, .15], p < .0001, showing an overall positive effect of the studied prevention programs. However, 25% of the studies showed an effect in the negative direction, meaning students appeared to be more supportive of dating violence after participating in a dating violence prevention program. This heightens the need for thorough program evaluation as well as the need for decision makers to have access to data about the effectiveness of programs they are considering implementing. Further implications of the results and recommendations for future research are discussed.

One in three teenagers experience verbal, emotional, physical, or sexual abuse by a person they are dating (Davis, 2008). That translates into about 1.5 million students of high school age per year who are victims of dating violence (Centers for Disease Control and Prevention, 2006). Among women, the age group with the highest rate of dating violence is 16-to 24-year-olds (U.S. Department of Justice, Bureau of Justice and Statistics, 2006), and dating violence appears to be by far the most common type of youth violence (Davis, 2008). These numbers are very disconcerting given that they represent an increase from the previous decade. The negative consequences of teen dating violence are well known. For example, surveys of secondary school students in the 1990s found that only 12% to 59% of high school students experienced non-sexual violence in their dating relationships and another 10.5% to 14.7% experienced sexual violence, such as coerced kissing, touching, and intercourse (Bergman, 1992;Foshee et al., 1996;Jezl, Molidor, & Wright, 1996). Moreover, adolescents involved in dating violence are at higher risk of further violence in future relationships, riskier sexual behavior (Lormand et al., 2013), and increased rates of substance use and eating disorders (Silverman, Raj, Mucci, & Hathaway, 2001). There is also evidence suggesting violence in relationships escalates over time, and some criminological scholars have spoken of "careers of wife assault" as early as the early 1980s (e.g., Pagelow, 1981). Clearly, teen dating violence represents a problem that needs to be addressed in a manner capable of reaching large numbers of youth.
School personnel are mostly aware of the problem of teen dating violence and many have implemented dating violence prevention programs in their curricula. Because they can reach a large percentage of an age cohort, schools provide a good venue to engage in primary dating violence prevention. Primary prevention programs are interventions that target students who have not previously engaged in violent acts. Primary prevention is successful if the targeted participants do not become involved in violence (Foshee et al., 1996). Secondary prevention is an intervention that targets those who have already committed acts of violence. Secondary prevention is successful if no new acts of violence are committed and/or the targeted participants are no longer victimized after the intervention. Dating violence prevention programs can vary in terms of their focus on primary versus secondary prevention. For the purpose of this meta-analysis, we focus on programs that are exclusively or mostly focused on primary prevention, aiming to reach a wide range of students.
Although many teen dating violence prevention programs have been implemented in schools, most of them are not 535787S GOXXX10.1177/2158244014535787SAGE OpenEdwards and Hinsz  (Macgowan, 1997). Instead, programs are often reported in the popular media and/or in form of descriptions and narratives as opposed to empirical evaluation (Nightingale & Morrissette, 1993;Roden, 1991;Sousa, 1991). In the absence of rigorous scientific testing and empirical evidence, little can be known about the effectiveness of school-based teen dating violence prevention programs. It also presents a challenge for teachers, administrators, and school districts to select which program to implement. There is a clear need for evidence based programs, studies that evaluate the effectiveness of the programs, and last, comparisons among the programs to identify those that are most effective.
The aim of this article is to (a) review existing primary dating violence prevention programs in schools and (b) provide a comparison of the effectiveness through methods of meta-analysis. We focus on programs targeting Grades 8 through 12 that have undergone empirical study and been peer-reviewed. These criteria were chosen for several reasons: First, school programs ensure that a large part of the target population is exposed to the prevention. Arguably, no other program site has more opportunities to reach a broad population and a specific age group. Second, prevention efforts during the teen years might be especially fruitful, because this is the time when students develop scripts that will guide their future actions in relationships (Koss, 1990;Levine & Kanin, 1987;Lonsway, 1996). In addition, adolescents at that age are also at risk of internalizing relationship violence as a normal part of their behavioral repertoire (Koss, 1990;Levine & Kanin, 1987;Lonsway, 1996). Last, this meta-analysis only focuses on programs that had been rigorously tested using empirical methods and been peerreviewed. This ensures the meta-analysis includes strong programs that went through rigorous development. Our hope is this article will be a resource to those who seek to implement strong programs in their communities.
Research in various fields suggests school-based programs to have desired effects on a variety of undesirable behaviors among teenagers, such as engaging is disordered eating (e.g., Stice, Rohde, Shaw, & Gau, 2011), bullying others (e.g., Brown, Low, Smith, & Haggerty, 2011;Olweus & Limber, 2010), engaging in substance (Faggiano et al., 2010) and tobacco (e.g., Andrews et al., 2014) use, and self-injury (e.g., Dunstan, Buckley, Chapman, Reveruzzi, & Sheehan, 2012). We therefore hypothesize that programs overall have a moderately positive impact on attitudes and behaviors conducive to subsequent dating violence. However, differences in program effectiveness could be expected based on variables such as mode of delivery, extended versus brief programs, trainer characteristics, and so on. We expect that programs that are longer in duration, use trained professionals as presenters, and use a variety of presentation modes will achieve better results. Recommendations for the types of programs that seem most promising will be made.

Description of Inclusion Criteria
Studies were included in the review if they (a) used an experimental or quasi-experimental design; (b) provided outcome data in a repeated measures design or for both the treatment and control group; (c) sampled 8th-to 12th-grade students; (d) used an intervention designed to reduce dating and/or sexual violence; (e) focused on, or included separate analyses of, primary dating and/or sexual violence prevention (i.e., the sample had no history of dating or sexual violence offending); and (f) included either attitudinal or behavioral outcome measures of sexual and dating violence (i.e., attitudes toward dating violence, self-reports of violent dating behavior or use of sexual violence, or agency or school reports of dating or sexual violence). Outcome measures were defined as follows: Attitudes toward sexual and dating violence were defined as scores on standardized instruments designed to measure sexual violence relatedattitudes, such as the Rape Myth Acceptance Scale (Burt, 1980) and the Sex Role Stereotyping Scale (Burt, 1980). Violent dating behaviors and sexual violence were defined as responses on self-reports used to assess the use of physical violence in a relationship or the use of coercion in relation to sexual acts. Agency or school reports of dating or sexual violence were defined as records of actions in response to the dating violence (e.g., suspension, criminal charges).
When multiple outcome measures were reported, one attitudinal measure or agency response was chosen according to the following hierarchy: (a) agency reports of dating or sexual violence, (b) self-reports of dating or sexual violence, and (c) attitudinal measures. If multiple attitudinal measures were used, effect sizes were calculated for all measures, and the average effect size was used for inclusion in this meta-analysis.

Search Strategy
To find relevant studies, a search of the Psyc Info database , and the reference lists of relevant articles was conducted. Searches were conducted by combining content terms such as dating violence, sexual violence, sexual abuse, or relationship violence with population-specific descriptors such as children, adolescent, or school. The search was restricted to studies authored in English that were published in book chapters or refereed journals. Titles and abstracts of search results were examined to exclude studies not meeting the above specified inclusion criteria. The remaining studies were examined and excluded if they did not meet criteria.

Overview of Included Programs
Eight studies met the inclusion criteria listed above. This section describes each program in greater detail. Programs are described in terms of desired states program goals, setting, method of delivery, duration, sample characteristics, and a verbal account of pertinent findings.
The Safe Date program (Foshee et al., 1996) is a multicomponent program designed to prevent the first-time occurrence of violence as well as to reduce or eliminate engagement in ongoing dating violence. It is therefore both a primary and secondary prevention program. Both school and community activities are offered as part of the program. The school programming consisted of 10 hr of classroom instruction integrated into required health classes, a poster contest, and a theater performance. Classroom instruction was interactive and designed to influence norms about dating violence, reduce gender stereotyping, and increase conflict management skills and knowledge about available services. Community activities consisted of enhanced services for teenagers involved in dating violence, such as victim support groups, emergency services, and so on. All participants were involved in community activities, but only the treatment condition also received the school programming.
The program was evaluated using eighth-and ninth-grade students in 14 schools in rural North Carolina. Results for the primary prevention group revealed that students who received the intervention were significantly less likely to commit psychological abuse. In the secondary prevention group, students who received the prevention training were significantly less likely to commit psychological or sexual violence. These effects were mediated by changed perceptions regarding negative consequences of dating violence and greater knowledge of services available. No differences were found with respect to gender, likelihood of victimization, or likelihood to commit physical violence (Foshee et al., 1998).
Another multimedia curriculum to prevent date rape, titled "Dating and Sexual Responsibility," was developed by Pacifici, Stoolmiller, and Nelson (2001). The program consists of role-plays, video presentations, and discussions during three 80-min classroom sessions. The themes of the sessions are defining coercion, exploring beliefs and attitudes contributing to coercion, and increasing social skills. Students also individually completed a computerized virtual date story, which left room for students to choose behaviors in situations. After the story, educational material was provided with regard to the choices the participant made.
The effectiveness of the program was evaluated using students from 23 health classes in 2 schools. Latent variable modeling showed the program improved coercive attitudes only for high risk students, that is, those who had above average coercive attitudes at pre-test. The authors report effect sizes ranging from small to large. Participants at the pre-test mean had a small effect (.25), those one standard deviation above the mean on coercive attitudes had a moderate effect (.50), and those with extreme attitudes had a large effect at 1.00 (Pacifici et al., 2001). Feltey, Ainslie, and Geib (1991) used a purely educational approach to primary dating violence prevention. Participating students listened to a 45-min lecture by a rape crisis counselor. The presenter gave information on gender role socialization throughout the life span. Presumed specific causes of rape were presented. These included (a) a lack of communication between the dating partners, (b) a lack of respect for females on the part of males, (c) peer pressure among males to be sexually active and among females to be more cautious about engaging in sex, (d) aggression among males relative to females' attempts to appear passive in order to be defined as feminine, and (e) situations that provide opportunities to engage in sexual behavior, such as private settings and an atmosphere of sexual expectation. (Feltey et al., 1991, p. 234) The impact of the program was assessed by comparing responses of participants with hypothetical scenarios that depicted sexual coercion. Participants indicated how acceptable it was to coerce a female to do specified sexual activities (kissing, making out, touching, intercourse) under different circumstances (e.g., after spending money on her, if the female is under the influence of substances). Six weeks after attending the lecture on gender role socialization, participants reported significantly less acceptance of sexual coercion (Feltey et al., 1991).
Another program that had a purely educational focus was evaluated by Hilton, Harris, Rice, Krans, and Lavigne (1998). The program was designed to enhance knowledge of risks and consequences associated with sexual assault and dating violence. All participants received a large group lecture about this topic and chose two out of six workshops, each an hour long, on different topics concerning dating violence. The prevention program was offered to all high school students; the evaluation however encompassed only juniors. One hundred twenty-three students completed the program and all three assessments. Approximately four times as many students attended the sessions but were not included because they missed one or more parts of the program. Such high attrition rates are cause for concern, especially as the authors reported significantly higher pre-intervention scores for those who did not complete the program. This means the students who have attitudes most supportive of violence and lack knowledge related to dating violence were more likely to drop out of the program.
In addition to assessing knowledge about dating violence, the authors also used a measure of date rape attitudes. This was used as the outcome measure included in the meta-analysis later in this article. There was no statistically significant change in attitudes after exposure to the program. It should be noted however that there was a trend toward the endorsement of more rape supportive attitudes immediately after the intervention, which was especially pronounced among the males completing the assessment. This study is one of only two contributing an effect in the undesired direction among the ones reviewed for this meta-analysis (Hilton et al., 1998).
Jaffe, Sudermann, Reitzel, and Killip (1992) conducted a large scale evaluation of their dating violence prevention program. It consisted of a large group presentation and smaller classroom discussion groups. The study used a large and diverse sample, and the program was planned by committees composed of school personnel and community professionals. However, the research instrument used to evaluate the efficacy of the program is not an established instrument but was developed for the purpose of the evaluation. Statistical analyses were conducted on a per-item basis. No specific results were reported, instead only approximate p values (e.g., p < .01). This made it necessary to estimate effect size based on the approximate p values that were given. Nine items that asked participants whether it is acceptable for a man to force his date to sexual intercourse under specific circumstances (e.g., after she got him sexually excited) were used as outcome criteria for the meta-analysis. Four out of nine items changed in the undesired direction after the intervention, that is, participants had greater acceptance of sexual violence after being exposed to the intervention. For the remaining five items, no exact statistics were reported; therefore, the effect size was estimated as zero.
Another program that was reviewed was developed by Lavoie, Vezina, Piche, and Boivin (1995). They evaluated two formats of their teen dating violence prevention program. One sample was presented with a short version of the program. The short version consisted of two classroom sessions that focused on control and abuse, rights in a relationship, and responsibility for abuse. Participants of the longer curriculum watched a movie on dating violence and had to write letters to a fictitious victim and a perpetrator of dating violence. The authors attempted to examine possible effects of program duration. Specific rationales for the assignment of writing letters were not given. Both formats of the program were evaluated using 10th-grade students; however, the samples were drawn from different schools and neighborhoods. This was deemed sufficient for the purpose of this meta-analysis to consider them to be independent samples. Therefore, each version of the program was allowed to contribute a separate effect size to the final analysis.
The outcome measure for both samples was an unspecified attitudes measure administered before and 1 week after the intervention. Both samples showed substantially improved scores after the intervention. One of the study's strengths was the rather large sample size (>200 for both samples). However, omission of important statistics (e.g., sample pre/post-test means, standard deviations) necessitated an estimation of effect size. Furthermore, because nothing was reported about the attitudinal measure, it is impossible to say with certainty what actually changed as a result of the intervention (Lavoie et al., 1995).
Last, a multisession curriculum was evaluated in a study conducted by Avery-Leaf, Cascardi, O'Leary, and Cano (1997). The curriculum was designed to address personal (psychological) as well as sociological factors related to dating violence. The goal was to increase equality between partners, change attitudes regarding gender and violence, and support development of healthy conflict and coping skills. The study used a quasi-experimental design that measures attitudes at two times without any intervention and compared them with attitude scores obtained by students before and after they completed the curriculum. Students who received the prevention found dating violence significantly less justifiable at the time of second assessment, whereas the beliefs of students in the control condition remained unchanged.

Analyses
Whenever possible, outcome measures of the treatment and control groups were evaluated and compared to calculate effect sizes; however as mentioned previously, a shortcoming of many studies was the lack of a control group. In these cases, the magnitude of the effect sizes were calculated based on changes in pre-and post-test scores. It was also planned to conduct various analyses concerning several hypothesized moderators. Some specific aspects of programs (e.g., duration) or methods of program delivery (e.g., video vs. presenter) might exert a noteworthy influence on the efficacy of such program. Hypothesized moderators included duration of the program, modes of presentation (e.g., large assembly lecture, discussion groups), and methodological aspects of the trials (rigorous adherence to experimental standards, standardization of procedures, use of reliable and valid outcome measures, etc.). Other moderating variables could be related to characteristics of the participants, such as gender, education, and socioeconomic status.
Product-moment correlation coefficients (ES r ) were computed and used as measures of effect size for the individual studies. This particular measure was chosen over the standardized mean differences as a measure of effect size, because studies included had used different designs. From some reports, effect sizes were calculated by contrasting preand post-intervention scores. Other reports included control groups, and the effect size was then calculated as contrast between the control and experimental groups after the intervention (adjusted for existing between condition differences, that is, pre-test scores). Therefore, it seemed logical to use ES r , which represents the strengths of association between any two (control vs. experimental or pre-vs. post-test scores) continuous variables.
Obtained effect sizes for program efficacies in the specific studies were combined and evaluated using meta-analysis to produce an estimated average effect size and 95% confidence interval (CI). For studies that reported multiple outcome items, multiple experimental or control groups, or pre/ post-test scores, an average was computed to avoid problems of interdependence, which would arise if one study contributed more than one effect size to meta-analysis. If standard deviations or exact p values were not reported, data were imputed according to standard statistical procedures (Tabachnick & Fidell, 1996).

Results
Seven reports (eight studies) met all inclusion criteria established. Of these, two had control groups, one had a quasicontrol group that completed all assessments before the intervention, and five used a within-subjects design. Four studies reported means and standard deviations. Four studies reported only partial data, but effect sizes could be reasonably well estimated by computing minimum t levels based on p values.
The overall weighted mean effect size across studies was significant (ES r = .11; z = 6.65, p < .0001; 95% CI = [.08, .15]). This means across studies, participants had lower scores on dating violence outcome measures after the intervention compared with their pre-intervention scores, or the scores of a control group. A correlation effect size of .11 is considered a small to medium effect. This correlation effect size translates into a Cohen's d of .19.
Homogeneity of the distribution of effect sizes was tested, Q(7) = 32.66, p < .01. Therefore, the distribution of study effect sizes shows more dispersion than what would be expected due to sampling error alone. Various planned moderation analyses (age, gender of participants, length of intervention, mode of presentation, presenter effects) could not be conducted because the sample of studies compared was too small. Table 1 shows correlations of the proposed moderators with effect sizes found in this study. Although the analyses are not significant, a trend can be observed. It appears that participants' age is most strongly related to the magnitude of effect an intervention has (r = −.42). Programs that had a younger sample were more effective than programs targeting older teenagers. No other correlational pattern of proposed moderators and effect size was observed.

Conclusion and Recommendations
The purpose of this article was to examine the treatment effects of peer-reviewed dating violence prevention programs for use in schools. Using meta-analytic methods, we found teen dating violence primary prevention programs in schools to have a small to moderate positive effect. Overall, students who received dating violence prevention instruction and training scored lower on attitudes about gender, power, and relationships that are known to increase the risk of dating violence perpetration or victimization.
This result was consistent with what we expected based on available research on school-based prevention programs. Notably and concerning, however, was the fact that two out of eight reviewed studies found deterioration in students' attitudes after exposure to dating violence prevention. This possibility has been noted before by several authors, for example, Winkel and de Kleuver (1997), who found that male students showed more acceptance of rape after being exposed to a video discussing sexual assault and punishment for the perpetrator. However, the effect sizes in the undesired direction were quite small in this review (ES r = −.07; ES r = −.04). Nonetheless, caution should be taken to not achieve an undesired backlash effect with prevention programs. An exploration of differences in content of programs or methods of delivery would certainly be a worthwhile undertaking to establish what factors might contribute to a programs' negative effect. Unfortunately, the small sample size (N = 8) did not permit meta-regression or other analyses that might have been able to shed light on what variables are predictive of direction of study effect size.
Through observation of correlations with proposed moderators, age of participants emerged as a possibly important factor. There was a medium negative relationship between sample age and program effect size. This finding suggests school-based prevention programming is more effective at a younger age and underscores the importance of such programming to not be delayed until teens are of an age that parents or administrators might consider an "appropriate" age to date. Although instruction of middle school students in such matters of dating and sexual violence might be a highly emotional topic for parents and administrators alike, serious efforts should be made to incorporate dating and sexual violence prevention in the curriculum of this age group given the high prevalence of teen dating violence (Davis, 2008). ESr is the weighed mean effect size; **p < .05

Limitations
The purpose of a meta-analysis is generally to test whether treatment effects can be found across samples, designs, treatments, and outcomes. A limitation to this meta-analysis was the small sample size. Only eight studies could be included. This is surprising given the large amount of attention to dating violence and the increasing number of programs conducted to prevent violence over the past two decades. However, despite the small sample size, we believe the results make three important points: (a) The majority of teen dating violence prevention programs that undergo empirical study appear to be effective, (b) the vast majority of teen dating violence prevention programs seem to not be subject to empirical testing, and (c) empirical testing is very important to ensure only programs that are safe and effective are used. There are certainly many worthwhile programs out there that are respected and used (e.g., Jones, 1987) but lack peerreviewed research on their effectiveness. If programs are evaluated, many are not reported in scientific literature but rather in un-refereed publications (e.g., school district reports, city council publications, or non-refereed books). This points to a problematic trend in the field. Many programs seem to be reported only to sponsoring agencies (e.g., a city council). Such practice precludes the sharing of information among the research community. This makes it very hard to get a hold of the actual data, judge a program's quality, and replicate it if it seems suitable for one's needs.
Once eligible studies were identified, several difficulties in comparing programs further imposed limitations on the extent of meta-analytic procedures that could be used. One very important issue is a decreased ability to directly compare programs due to programs differing greatly in their definitions of sexual violence. Some programs define only forceful sexual intercourse as sexual violence, whereas others might use a very broad definition that includes behaviors such as unwanted touching and talking to a person.
Another difficulty reflecting a lack of a common operational definition of violence is that different programs focus on different acts of violence. Some programs emphasize physical violence (e.g., hitting or punching a partner), whereas others might focus on sexual violence (e.g., date rape), and many programs combine different aspects of violence (Cornelius & Resseguie, 2007). Therefore, even when studies provide adequate operational definitions of dating violence and state clearly what aims the investigated program wants to address, it is still challenging to conduct comparisons across programs if there is a wide array of intervention targets for the different programs.
Excessive heterogeneity of programs in terms of differences in intervention duration (e.g., reviewed programs lasted from under an hour to 4 months), intervention method (lecture, discussion, role-plays, etc.), target age range, and outcome measures used to evaluate impact further complicates comparisons of programs. This could be remedied if enough studies were available to test for moderating effects of the above mentioned sources of variance.

Implications for Practice and Research
In summary, a few problems of the field became apparent as a result of this attempt to aggregate dating violence prevention studies. First, there is a lack of a clear definition of dating violence and date rape. Second, most programs are either not empirically evaluated or not reported in refereed and readily available publications. Some reported trials had questionable methodology; for example five of the eight of the reviewed studies lacked a control group or standardized procedures. Data are often reported inconsistently, for example, standard deviations, correlations, or exact p values were omitted. This complicated the analyses and sometimes necessitated estimating effects based on approximate p values.
It also became evident from this review that most programs use a multifaceted approach. Programs might include many different activities drawn from distinct theoretical directions. Such an approach is not necessarily problematic. On the contrary, using multiple methods might be more engaging and effective. However, it becomes a problem if elements of the program lack a clear operational definition. Furthermore, with programs that consist of multiple elements, it becomes impossible to separate out the effective from the ineffective components, unless rigorous assessments are conducted after each program element.
There is a clear need for programs to reduce dating violence among students. Because students start dating when they are of high school age, targeting students early in middle school seems most promising. To provide effective and proven programming, more cooperation between researchers and those who deliver programs to students is needed. Currently, programs are mostly developed by workers in the field. These professionals are knowledgeable about how to effectively reach students and deliver messages. When it comes to evaluating programs however, people who are not as familiar with research methodology are less likely to conduct rigorous quantitative evaluations of their programs. If researchers and practicing professionals worked together, both sides could benefit each other and supplement each other's knowledge.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research and/or authorship of this article.