Sexual Arousal as a Function of Stimulus Mode: Implications for Phallometric Assessment

Though phallometric assessment is no stranger to controversy it is still a procedure which is widely recognized as having considerable value. Researchers have cited it as a reliable means to assess age and gender preferences and to assess a preference for coercive versus consenting sexual activities. It is also the only method which allows a determination of an offender’s ability to inhibit deviant sexual arousal, a factor of principal importance is assessing risk of reoffending. This procedure is weakened, however, by the problem of low arousal, and often results are deemed too low to be interpreted. A factor contributing to low arousal may be the widespread availability of pornographic material on the internet, for this may desensitize participants to weaker stimuli used in some labs. In response to ethical concerns some labs have adopted the use of audio stimuli alone, and this may compromise the procedure. This study compares arousal to consenting adult heterosexual, adult female rape, and heterosexual pedophilia themes in response to audio versus slide versus video stimuli. Results from 142 incarcerated inmates reveal that visual stimuli are most effective and that the use of audio stimuli alone often yields sexual arousal profiles which are too low to be interpretable.


Introduction
Sexual arousal as a function of stimulus mode: Implications for phallometric assessment Phallometric assessment is a procedure which has generated some controversy and invited concerns from researchers who have raised many legitimate and challenging questions, perhaps principally because of widely-acknowledged inconsistencies in test administration and interpretation [1][2][3][4][5][6][7]. There are nonetheless a great many researchers and clinicians who recognize the worth of this procedure on a number of dimensions and advocate for its use in both research and clinical settings [8][9][10][11][12][13]. The differing opinions which have been ventured with regard to the value of phallometric assessment should not be viewed with any discomfort, however, for as so eloquently expressed about a different contentious issue [14]"It is the nature of a maturing field of study that opposing views are expressed and, indeed, the clear articulation of contrary opinion is the lifeblood of progress in science." In their thorough and masterful review of the theory, assessment, and treatment of sexual deviance, Laws and O'Donohue observed that "there is an abundance of research supporting both the discriminative and predictive validity of PPG" and they concluded that "PPG remains the most trusted assessment tool" [15].
Particularly with regard to a sexual preference for children these authors ventured, "we must conclude that phallometric assessment has emerged as the most reliable and valid procedure to assess this preference." This is certainly consistent with the earlier meta-analysis by Hanson and Bussiere who concluded that phallometric measures of arousal to child stimuli are the best predictors of recidivism [16]. Citing more recent studies such as Hanson and Morton-Bourgon's meta-analytic research which also indicated that deviant arousal to children is related to higher risk of reoffending against children [17], Wilson et al. commented, "Perhaps the best means of objectively measuring deviant sexual interest is the phallometric test" [7]. Greater arousal to depictions of coercive sex has been observed in convicted rapists [13,18], and although Wilson et al. identified concerns about a lack of standardization they nonetheless observed that "many practitioners continue to view phallometry as a reliable means of assessing deviant sexual interests." In what is perhaps the most recent comprehensive examination of the assessment and treatment of sexual offenders Meridian and Jones noted that "research outcomes consistently report reasonable correlation coefficients between self-reported and phallometrically measured arousal" [19].
They also reminded that "metaanalytic studies continue to point to the results of phallometric assessment as a valid predictor of future risk." As well these authors commented on the face validity of the procedure and suggested that offenders who deny a sexual attraction to children have little choice but to accept the existence of a problem when confronted with phallometric results which clearly demonstrate this preference. All in all, the authors concluded, "Although the early enthusiasm for phallometric assessment has slightly faded since then [its introduction], no other assessment method has really challenged its place." The value of phallometric assessment in diagnosing paraphilias has of course been noted by many researchers [13,20,21], and it has even been described as an effective means by which sexual offenders' minimizations and outright denial can be confronted [12,22]. The benefits of using phallometric data to confront denial are almost axiomatic, and as an example it is difficult for a pedophile to argue that he is sexually aroused only by adults when phallometric evidence reveals a strong sexual preference for children over adults (something many treatment providers may have used to good therapeutic advantage). In a similar manner, as noted in the ATSA Practice Standards and Guidelines [23], phallometric testing may also increase client disclosure, and in the most recent version of these guidelines [24]. It is acknowledged that the phallometric procedure has value in "exploring the reliability of client self-report. " In their recent study Bradford et al. described phallometric testing as "an objective, physiological indicator of deviant sexual behaviour that provides reliable evidence in the absence of an accurate diagnosis" [20], and Rice et al. concluded that "Regardless of the causes of anomalous sexual interests [25], it is clear that phallometric measurement remains the best available technique for their study." This impression was clearly held by Lykins, et al. who offered the view that "Phallometric testing is widely considered the best psychophysiological procedure for assessing erotic preferences in men" [26].
Of particular significance in examining the value of phallometric assessment is the research by Clift et al. which noted that, "an inability to control deviant arousal to stimuli that depict forced and non-forced sexual interactions with male and female children is related to sexual recidivism" [27]. A similar conclusion was earlier advanced by Howes [18] in a study which compared the sexual arousal profiles of 50 incarcerated sexual offenders with 50 incarcerated non-sexual offenders [18].
In this study only 32% of the rapists and just 10% of the pedophiles were able to inhibit their deviant sexual arousal in the phallometric testing situation when directed to do so, whereas 98% of the control group of non-sexual offenders readily demonstrated this ability. It was apparent that those convicted of sexual offences not only demonstrate deviant sexual arousal profiles but they may have what might be conceptualized as some sort of impulse control deficit (also described by Neutze et al. as "sexual self-regulation problems") [28]. It is only through the use of phallometric testing that the extent of this deficit can be made evident.
Howes concluded by commenting that phallometric assessment offers a means of clearly identifying any individual's sexual preferences (in terms of gender, age, and level of violence) and, of perhaps even more importance in the assessment of continuing risk, it provides a means of determining the extent to which offenders are able to inhibit their deviant arousal. Again, there is simply no other laboratory measure of the extent to which this control is or is not present.
In short, although it is not universally embraced there nonetheless remains widespread acceptance and recognition of the value of phallometric assessment. This is certainly an assessment procedure which has come a long way since it was first devised (under communist government pressure) by Dr. Kurt Freund in the mid-1950s in the former Czechoslovakia to identify conscripts who were claiming a homosexual orientation to avoid military service. The real value of phallometric assessment may be most notably compromised by the problem of low arousal, however, for when testing does not elicit significant arousal (with the most commonly accepted though not universally adopted measure of significance being 20% of a full erection [3], then interpreting such results is beset with problems. When full arousal is obtained during phallometric testing the problems in data interpretation are of much lesser magnitude, but in cases of low arousal there has been no consistent approach to interpreting these results. It was in fact to address the problem of low arousal that Howes [29] obtained data from nine North American correctional facilities which revealed the average circumferential change from flaccidity to full erection in the sample of 724 men who achieved full arousal was 32.6 mm with a standard deviation of 8.8 mm, and these scores fell in a normal distribution. Having these normative data allows for the interpretation of low arousal scores, and as an example it was noted that "a circumferential change score of 10.6 mm (even in the absence of full erection) can be regarded as significant [using 20% full erection as the criterion of significance] and interpretable at the 0.01 confidence level." The problem of low arousal during phallometric testing has been noted in the research literature for decades [30,31], and in many cases this has been a substantial problem rather than merely a trifling inconvenience. Malcolm, Andrews, and Quinsey discarded 48% of their sample because of low responding [32], and Haywood, Grossman, and Cavanaugh indicated that all 51 participants in phallometric testing (including 27 control participants) revealed what was described as low arousal [33].
More recently Michaud and Proulx [11] eliminated data from 264 potential participants (out of an original sample of 745) because they were determined to be low responders or non-responders, and Lykins et al. noted that previous studies in their lab had excluded from analysis all subjects who failed to meet a criterion threshold of arousal [26]. In fact it was "to mitigate the problem of low responses in phallometric studies" that Bradford et al. undertook research examining the effect of prescribing 50 mg sildenafil (Viagra) to phallometric testing participants [20]. While this study concluded that the results "support the use of sildenafil as a pre-treatment enhancement for phallometric testing to overcome the problem of low responders," which is certainly a valuable finding, the fact that the study was conducted in a medical facility might unfortunately limit the extent to which this finding has practical application in non-medical or outpatient or even prison settings.
One factor which may be contributing to low arousal in phallometric testing, as suggested by Lykins et al. is the advancement of technology such that the average person's access to virtually every form of explicit pornography via the Internet is relatively unhindered [34]. As these authors note, "The Internet provides access to a virtual smorgasbord of pornographic material, both professional and amateur, making available nearly anything a person might find sexually appealing (both legal and illegal) at the click of a mouse." This obvious truth leads one to wonder if commonly-used phallometric stimuli might have less and less potential to evoke arousal given such widespread exposure to more powerful erotic stimuli. ATSA specifically addressed the issue of low arousal with the observation that "Stimuli that are more explicit appear to produce better discrimination between individuals who sexually abuse and control subjects than less explicit stimuli" [23]. In the first of their two 2010 studies Lykins et al. [26]maintained a similar position by commenting that in terms of best clinical practice "It may also be useful to use stronger sexual stimuli for phallometric testing." In their studies they used audio stimuli accompanied by slides (i.e., Narratives-Slides Test) because this "was designed specifically to be a more powerful test in terms of eliciting penile tumescence than its predecessors." The predecessors used for comparison in the Lykins et al. first study were 14 second film clips of nude children or adults smiling and walking towards the camera (i.e., Walking Nudes Test), and the subjects in these video stimuli were described as "not engaging in any overtly sexual or even flirtatious behavior." It might even usefully be noted that these researchers used three projection screens with the goal of enhancing the effects of both stimulus modes. The Narratives-Slides Test simultaneously presented a full-length front view, a full-length rear view, and a close-up of the genital region of the nude model who corresponded in age and gender to the subject described in the narrative. Results revealed that the Narratives-Slides stimuli were more effective in eliciting sexual arousal, and it appears probable that the more sexually explicit nature of these stimuli accounted for this difference.
According to anecdotal reports the use of audio stimuli alone in phallometric assessment has been adopted by some testing labs, principally it would seem to address the legitimate concern that the use of any visual stimuli re-victimizes the subjects (especially when the subjects are children). Whether or not audio stimuli alone are sufficient to elicit significant arousal in most participants has not been properly established, however, and this practice would be at odds with the recommendations of researchers such as Laws et al. who concluded that the "use of images best discriminates age and gender preference" [35,36].
It has been suggested, somewhat facetiously but nonetheless cleverly, that the best evidence of the relative weakness of audio sexual stimuli and the relative strength of visual stimuli can be found in the marketplace. If one were to patronize an 'adult' store and ask for their pornographic CD section one would most likely be met with a blank stare. No such audio stimuli are typically available because there is simply no identified market for them, this seemingly providing an indication of most men's erotic preference for video stimuli. Given the choice between looking through a keyhole versus listening at a keyhole it would seem that most men anticipate finding the former more sexually stimulating/arousing. The issue of the relative strength of various phallometric testing stimulus modes is one that lends itself to further investigation, however, and such is the purpose of the current study.

Method
Archival data from 142 federally-incarcerated offenders who participated in phallometric testing at Stony Mountain Institution, a Canadian federal penitentiary, were available for analysis. Ninety-two of these offenders were incarcerated for sexual offences and 50 (who comprised the control group in an earlier study) were incarcerated for non-sexual offences.
By way of explanation the non-sexual offenders had volunteered to participate in phallometric testing (anonymously) solely for research purposes, with only a research participant certificate as an incentive. It should be noted that all participants in the current study voluntarily consented to take part, even the sexual offenders, and the fact that participation wasn't mandatory was affirmed to them all. All participants signed a consent form prior to testing, a form which included their consent to view sexually-explicit child and adult pornography which they might find offensive, and the research was subjected to an ethical review.
All participants completed a single phallometric testing session approximately two hours in duration, with sexual arousal being measured by a mercury-in-rubber strain gauge. Participants were placed alone in the viewing room to afford them privacy, but they remained on video camera throughout the actual testing procedure to ensure proper attention to the stimuli.
A number of themes were presented during testing, from adult female bondage to consenting homosexual adult as well as standard age and gender preferences, but only the data from three of these themes (i.e., consenting heterosexual adult, heterosexual pedophilia, and adult female rape) were examined in the current study. This was because it was only for these three sexual themes that data were available showing arousal to audio, slide, and video stimuli. None of the other sexual themes which formed a part of the standard phallometric testing procedure offered data from all three of these stimulus modes.
Time of exposure to each of these stimulus modes was in most cases roughly equivalent for two of the three sexual themes, and in fact 12 slides at 32 seconds each (i.e., 6 minutes 24 seconds in total) were presented for all three sexual themes. In terms of exposure to video stimuli (with no audio component) the time of exposure was 8 minutes for both consenting adult heterosexual and heterosexual pedophilia, with 8 minutes 40 seconds exposure to video stimuli for the adult female rape theme.
More substantial differences in time of exposure to the audio stimuli existed such that the time of exposure was 4 minutes 30 seconds for heterosexual pedophilia, 6 minutes 50 seconds for adult female rape, and 8 minutes for consenting heterosexual adult.
These differences were a function of the fact that time of exposure does not appear to have been a consideration when the standard audiotape stimuli were originally created years ago. In cases where audio and slide stimulus modes were combined the resulting data were not used in this current analysis since two of the sexual themes did not have this component.
Most of the audio and slide stimuli used were originally provided by the Regional Treatment Centre (RTC) at Kingston, Ontario, a federal correctional facility and research centre, and these stimuli have previously been used in research originating in this and other centres. The slide stimuli portray full nudity in all subjects, and the audio stimuli are narratives of sexual activities described with a male voice.
The video stimuli used to examine heterosexual pedophilia were also provided by the RTC and they were clearly confiscated child pornography films. The video stimuli used to examine adult female rape were segments from two mainstream films (i.e., Death Wish, and The Accused) with the soundtracks being muted, and the video stimulus used to examine consenting adult heterosexual was a standard heterosexual consenting adult pornographic film.
These video stimuli have been used in the principal author's facility for more than twenty years and considerable normative data are available. In all cases stimuli examining new themes were not presented until complete detumescence had occurred, and neutral audio stimuli were sometimes employed to facilitate this.
The average age of the offenders in this study was 31.4 years (ranging from 19 to 58), with 44.7% being identified as white, 47.8% identified as North American Aboriginal (reflecting the disproportionate presence of Aboriginals in the Canadian prison population), and the remainder were black, East Indian, or 'other' . Sixty-three percent of those incarcerated for sexual offences had adult victims and the remainder had child victims, and the average sentence length was 50.1 months.
With regard to marital status, 42.1% of the offenders were single at the time of their arrest, 35.3% were married or living common-law, and 22.6% were separated or divorced. A substance abuse history was noted for 76.2% of the offenders, and 27% had one or more previous convictions for sexual or non-sexual violence. As a matter of some interest, 29.3% of the sexual offenders did not admit to their guilt with regard to the offences for which they had been convicted.
Data were not obtained for all of the 142 offenders in every category, the explanations for this ranging from the occasional practice of brief, focussed testing (where the full range of sexual themes was not examined) to infrequent instances where the offender independently ending the testing session prior to completion.
Only where an offender's arousal to all of the three stimulus modes depicting the three sexual themes was measured were these data included in the final analysis. Those offenders for whom data were obtained for only one or two of the three stimulus modes were eliminated from the final analysis in that category, which accounts for the slight differences in group sizes, and this ensured that perfect within-groups comparisons could be made.
In short, the current study was undertaken to compare sexual arousal (in millimeters of circumferential change) in response to audio vs. slide vs. video depictions of three sexual themes presented to the same group of offenders.

Results
The circumferential change responses of 120 offenders were obtained to the theme of consenting adult heterosexual sexual activities as portrayed by audio, slide, and video stimuli (Figure 1). The circumferential change responses of 128 offenders were obtained to the theme of adult female rape as portrayed by audio, slide, and video stimuli (Figure 2). The circumferential change responses of 118 offenders were obtained to the theme of heterosexual pedophilia as portrayed by audio, slide, and video stimuli (Figure 3). Each of the three sexual themes with three stimulus modes was tested using repeated measures (within-subjects) one-way ANOVAs, thus allowing for comparisons to be made of the same subjects under different conditions (Table 1).
Three pair-wise comparisons were then performed using Tukey HSD tests to determine the significance of the observed differences between the three stimulus modes for each of the sexual themes (Table  2).  The results also reveal that when using the criterion of 6 mm of circumferential change (i.e., approximately 20 PFE ((percent full erection)) based on published normative data identifying the average circumferential change to full erection being 32.6 mm with a standard deviation of 8.8 mm, Howes [29], as the cut-off for excluding scores from interpretation it is observed that 56.6% of all offenders did not surpass this criterion score when data from only the audio stimuli were examined. Conversely, only 12.3% of all offenders did not surpass this criterion score when data from only the video stimuli were examined.

Discussion
It is abundantly clear from these data that higher sexual arousal is consistently obtained in response to visual stimuli and that audio stimuli yield the lowest sexual arousal responses. In the case of all three of the sexual themes tested the audio stimuli elicited arousal at a level not high enough for interpretation for more than half of the offenders whereas the visual stimuli elicited arousal at a level high enough for interpretation in seven out of eight cases. Those who use only audio stimuli are thus in most cases likely to find phallometric testing to be of no help whatsoever in providing interpretable sexual arousal profiles. This will not only be a great waste of clinicians' time, in most cases, but it also compromises the worth of a testing procedure which, as the research literature so commonly attests, has a demonstrated ability to identify convicted sexual offenders who are at high risk to reoffend.
The possibility that the unavoidable differences in duration of presentation of the audio stimuli may have influenced arousal must of course be considered to account for some of the variance observed, for as indicated not each stimulus was presented for exactly the same amount of time. One might well predict there may be an effect of stimulus duration, but aspects of the results suggest no such effect was demonstrated. For one thing, it was commonly observed (by viewing the arousal tracings) that the highest levels of arousal most often occurred fairly quickly after the presentation of a stimulus (i.e., within a minute), and additional time of presentation did not heighten this arousal. As well it should be noted that for two of the three sexual themes (i.e., adult female rape, heterosexual pedophilia) the time of exposure to audio stimuli was not the shortest of the three stimulus modes, and in spite of this longer exposure these audio stimuli still elicited the lowest levels of arousal.
The finding that in most cases audio stimuli do not appear to have the power to elicit interpretable sexual arousal is distressing in that the use of audio stimuli alone allows an escape from the sensitive issue of using visual pornography, especially child pornography. It has been argued that the use of pornography re-victimizes those being portrayed, and this is both a legitimate and honourable consideration. It is beyond the scope of this paper to attempt to do justice to the debate on the ethics of using confiscated pornography in phallometric assessment, but the cost of an outright rejection of such stimuli needs to be examined. Clearly it is only the best of intentions which motivates the argument that visual pornography should not be used, but doing this is likely to critically weaken phallometric assessment. It is very easy to embrace the view that one is taking the moral high ground when one argues against the use of visual pornographic stimuli but, and the conjunction "but" cannot be emphasized enough, when the principal goal of phallometric assessment is or should be to assist in identifying high-risk sex offenders who represent an unacceptable risk of re-offending if released to the community then perhaps the principle of harm reduction needs to be afforded the greatest weight.
The frequently-evidenced ability of phallometric assessment to identify convicted sexual predators who continue to represent an ongoing risk should not be sacrificed because of well-intentioned efforts to restrict it to using only audio stimuli. For this assessment procedure to perform at its best it is necessary to use stimuli with the most power to evoke deviant arousal (i.e., video stimuli) and, of perhaps greater importance, stimuli which best challenge the ability of offenders to inhibit deviant arousal. It is, after all, the inability to inhibit deviant arousal which has been identified in the literature as the principal distinguishing feature between sexual offenders and nonsexual offenders in terms of their sexual arousal, for both groups commonly reveal some deviant arousal. On moral grounds the use of videotaped pornographic stimuli invites an almost visceral reaction in many people, but as sympathetic as we should assuredly be with this concern a compelling argument can be made that when the primary purpose of using such stimuli is to assist in identifying high-risk sexual offenders who can thus be kept incarcerated then the protection of the community is best being accomplished.
Phallometric testing is of course not a stand-alone measure in determining risk of sexual reoffending and it should be viewed as simply one of many sources of information which need to be considered (e.g., actuarial measures, treatment program participation, victim empathy, disinhibiting substance abuse problem, viable relapse prevention plan, etc.). It can be, however, a very strong component of any thorough sexual offender assessment, especially to the extent that it offers a completely objective measure of deviant arousal and the ability (or inability) to inhibit this. The omission of this testing procedure from the sexual offender assessment battery, or the weakening of this procedure by using only audio stimuli, may seriously compromise our ability to assess risk of further sexual aggression.