Developing the Premonitory Urges for Tic Disorders Scale – Revised (PUTS-R)

Background. Patients with Gilles de la Tourette syndrome (GTS) or chronic tic disordersfrequentlyexperiencepremonitoryurgespriortotics.The‘PremonitoryUrgesforTicDisordersScale’(PUTS)iscommonlyusedinordertoassessurgeseverityinpatientswithtics.SeveralstudiessuggestthatthePUTSmightmeasuremorethanone dimension of urges. These include the quality and severity of premonitory urges. Methods. This study aims to replicate and extend previous ﬁndings concerning the psychometric properties of the PUTS and its underlying dimensions in a large sample of 241 patients with GTS including both adults ( n = 93; mean age = 34.2 (cid:1) 12.84; 73 male) and minors ( n = 148; mean age = 11.8 (cid:1) 2.86; 123 male), pooled from three different recruitment sites. Results. Data analysis conﬁrmed good reliability across the PUTS items for both minors andadults andacceptable item characteristics for items2 – 8. A factoranalysis ofitems 1 –

The vast majority of adolescents and adults with Gilles de la Tourette syndrome (GTS) experience 'premonitory urges' or simply 'urges,' immediately prior to a tic. The prevalence of these urges has been reported to occur in approximately 77% of GTS patients over 13 years old, and in 90% of those over 18 years (Bliss, 1980;Brandt, Beck, Sajin, Baaske, et al., 2016;Kwak, Dat Vuong, & Jankovic, 2003;Sambrani, Jakubovski, & Muller-Vahl, 2016). Awareness of these feelings of premonitory urge tends to increase in children as they become older (Banaschewski, Woerner, & Rothenberger, 2003), potentially as a consequence of the normal development of self-awareness/bodily urges. For most, these unpleasant, involuntary sensations are commonly accompanied by a feeling of unease or anxiety which can be relieved by executing a tic (Brandt, Beck, Sajin, Baaske, et al., 2016;Cavanna & Nani, 2013;Kwak et al., 2003). Premonitory urges are a core feature of GTS for many, and awareness of them is critical for behavioural interventions such as habit reversal therapy (HRT) (Azrin & Nunn, 1973) and extensions of this, such as comprehensive behavioural intervention for tics (CBIT) (Piacentini et al., 2010). Hence, to enhance our understanding of GTS and improve treatment options, there is a clear need to develop reliable and accurate approaches to measuring premonitory urges.
The most commonly used approach to assessing urges in GTS is through the use of the Premonitory Urges for Tics Scale (PUTS). This scale was developed by Woods, Piacentini, Himle, and Chang (2005) and is a short, easily administered questionnaire. PUTS has been shown to have good to acceptable reliability in individuals over 10 years old and good convergent validity (Brandt, Beck, Sajin, Anders, & Munchau, 2016). However, internal consistency in children younger than 11 years appears to be questionable (Raines et al., 2018;Steinberg et al., 2010;Woods et al., 2005). Correlations in the low-medium range between tic severity scores and the PUTS are in line with the assumption that these two phenomena are related but distinct constructs (Brandt, Beck, Sajin, Anders, et al., 2016;Ganos et al., 2012;Raines et al., 2018). However, substantial positive correlations of the PUTS with obsessive-compulsive symptoms and anxiety (Rajagopal & Cavanna, 2014;Reese et al., 2014;Steinberg et al., 2010;Woods et al., 2005) point towards low discriminant validity of at least some of the PUTS items.
Originally, the PUTS questionnaire was designed to assess a one-dimensional construct, that is premonitory urges. However, factor analyses suggest that the PUTS measures at least two dimensions (Brandt, Beck, Sajin, Anders, et al., 2016;Raines et al., 2018): (1) the intensity or frequency of urge phenomena and (2) the sensory quality of urges. Hence, the development of different PUTS subscales might be useful. Furthermore, studies investigating changes in premonitory urge intensity after treatment have often failed to show any differences when this was assessed using the PUTS (e.g., Houghton et al., 2017;Nonaka et al., 2015). While it is possible that urges are not affected by treatment, the findings could also be due to a lack of sensitivity in the measure. Specifically, relatively few items of the scale may measure urge intensity and these items may be too vague to assess subtle changes.
The main aim of the current study is to confirm and extend previous findings regarding item characteristics, reliability and underlying dimensions of the PUTS both in minors and adults with GTS, in a sample large enough to produce robust results. To our knowledge, this study also represents the first to investigate psychometric characteristics of the PUTS, based on the item response theory (IRT). The second aim of the study is to use the results to make clear recommendations for a revised version of the PUTS and to develop an item pool that can be tested to develop an urge scale based on more favourable psychometric properties. In doing so, this research is the first step in developing a revised PUTS.

Participants and clinical assessments
The data used in this study represent secondary analysis of PUTS scores, which were collected as part of routine assessment along with the Yale Global Tic Severity Scale (YGTSS) during a range of different experiments performed across three sites.
The study included 241 patients with a confirmed diagnosis of a tic disorder according to DSM-IV-TR (DSM-IV-TR, 2000) or DSM-5 criteria (DSM-5, 2013), of which 93 were adults (mean age = 34.2 AE 12.84; 73 male) and 148 were minors under the age of 18 years (mean age = 11.8 AE 2.86; 123 male). At the time of assessment, n = 72 adults fulfilled criteria for GTS, n = 1 for a chronic phonic tic disorder and n = 20 for a chronic motor tic disorder. N = 131 minors fulfilled criteria for GTS, n = 3 for a chronic phonic tic disorder and n = 14 for a chronic motor tic disorder. Of the minors, 98 were between the ages of 11 and 18 (mean age = 13.5 AE 1.9; 76 male) and 50 were 10 years or younger (range = 6-10; mean age = 8.6 AE 1.23, 50 male). Data were pooled from three different sites and across several experiments: Hannover, Germany (n = 96; collected 2013-2015) and L€ ubeck, Germany (n = 80;collected 2011 and Nottingham, UK (n = 65;2013. Of all the studies, these data were pooled from, only one with a sample size of 15 excluded participants based on comorbidity. All other studies included a representative sample of GTS participantssome of whom had comorbidities or were taking medication. All patients and parents, respectively, had given their written informed assent/consent prior to taking part in the primary study. The data were anonymized before pooling. Each primary study was reviewed and approved by the respective local ethics committee and conformed to the Declaration of Helsinki.

Questionnaires
All patients filled out the PUTS (Woods et al., 2005), a 10-item questionnaire, assessing urge intensity on a 4-point Likert scale (range = 10-40). Small children received help from their parents filling out the PUTS. Parents were instructed as follows: 'please help your child fill out this questionnaire. Please explain to your child what each question means and discuss with them which response might be most accurate. If you have any questions, please ask the experimenter. German centres used the translated and validated German version of the PUTS' (R€ ossner, M€ uller-Vahl, & Neuner, 2010). In addition to the complete PUTS scale (referred to as PUTS 1-10 ), we explore the 9-item PUTS score (PUTS 1-9 ) because item 10 ('I am able to stop my tics even if only for a short period of time') is commonly dropped from the overall questionnaire score (Reese et al., 2014;Woods et al., 2005).
The Yale Global Tic Severity Scale (YGTSS) was also completed by 236/241. The YGTSS is a structured and validated interview which measures total tic severity (TTS; range = 0-50) during the past 7 days, with good reliability (Storch et al., 2005). Total tic severity is calculated from the sum of the subscores which measure the number, frequency, intensity, complexity and interference of motor and phonic tics. A separate item which is not factored into TTS assesses overall impairment and is scored out of 50. Subscores were available for 69/93 adults and 142/148 minors for further analyses.

Statistical analysis
Of all PUTS values, 1% were missing. Missing values were not replaced because the rate was <5% (Bennett, 2001;Schafer, 1999). Item difficulty and discrimination were tested according to the classical test theory (CTT) in SPSS (IBM Corp, 2016) and the IRT in R (R Development Core Team, 2017).
Item difficulty indicates how many participants have answered an item above the mean of the scale. For items with multiple response options, item difficulty is calculated as follows: p ¼ P i xi kÃn (Bortz & D€ oring, 2013;Moses, 2017). Values for each item are summed up across participants (x i ) and are divided by the number of participants (n) multiplied by the item response levels (k). Items that all participants answer with '4' or '1' are not useful because they do not discriminate between participants. In CCT, item difficulty can range between 0 and 1; items with a difficulty under 0.2 or over 0.8 are commonly excluded from questionnaires. In IRT, a similar parameter is called 'threshold' for items with multiple response options. Thresholds (b values) indicate the point of ability or underlying construct where participants switch from one response option to the next. If the item measures the intensity of an underlying construct, the probability of switching to a higher response option should be ordered according to the response option (i.e., patients with very high urge intensities should be more likely to select 4 as a response option than 3). Non-ordered thresholds indicate that patients with higher symptom severity do not necessarily select higher values on this particular item and that the item might therefore not be ideal to measure the construct of the scale.
Item discrimination refers to the ability of an item to discriminate between participants who will score high or low on a questionnaire, that is how well one item reflects the whole scale. In CTT, this is assessed with part-whole corrected item-to-total correlation (Pearson's r, values >.40 are considered adequate). In IRT, items with slope values of 0.65-1.34 are considered moderate, values >1.34 high (R Development Core Team, 2017).
Cronbach's alpha was calculated as a measure of reliability (internal consistency) for minors and adults, respectively, and for minors ≤10 and >10 respectively. Cronbach's alpha >.80 is considered good. Correlations between the PUTS items and the YGTSS scores were conducted using Spearman's rank coefficients.
Furthermore, a weighted least squares factor analysis for ordinal data with varimax rotation was conducted in Lisrel (J€ oreskog & S€ orbom, 2018). Due to low inter-item correlations (Table S1), items 9 and 10 were not included in the factor analysis (Field, 2013;Raines et al., 2018).

Item development
To extend the urge quality subscale, items were developed based on the literature (Banaschewski et al., 2003;Kurlan, Lichter, & Hewitt, 1989). Every quality described in the literature in association with urges was included as an item, using the phrasing of the original PUTS 'right before I do a tic. . .'. Existing items that contained more than one construct (item 1 refers to a ticklish or itchy feeling) were rephrased into separate items, so that each item refers to one construct only (Price, Jhangiani, & Chiang, 2015).
For the severity items, a deductive approach was used (Burisch, 1984). The chosen definition of the construct to be measured was developed based on the pre-existing literature and reads as follows: 'premonitory urges are uncomfortable sensations or feelings that increase before a tic is executed but may vary in intensity between individuals, tics and time points.' Based on this construct, VB and AM formulated a set of items, based on the principle that items should be 'brief,' 'relevant,' 'unambiguous,' 'specific' and 'objective' (Price et al., 2015). The items were reviewed and adjusted after feedback from KMV, KD, GJ and SJ. The questionnaire was then translated to German by VB and back-translated to English by DG. English and German versions of the questionnaire were given to two patients each for initial feedback regarding the nonambiguity and face validity of the items.
It is important to stress that these items will now need evaluation and selection based on their psychometric properties. The items should be given to a large group of TS patients, together with a number of measures that assess the same construct (i.e., urge severity) and related but different constructs (e.g., obsessive-compulsive disorder [OCD]) in order to determine individual item properties, convergent validity (high correlations with measures that assess the same construct) and discriminant validity (lowmedium correlations with scales that assess other, possibly related constructs). Factor analyses should be used to test dimensionality of questionnaires. Items with unfavourable psychometric properties should be excluded from the final version of a revised PUTS scale.

Item difficulty and discrimination
Item 10 shows high item difficulty for both adults and minors, indicating that most participants select 3 or 4 for this item. Hence, this item is not ideally suited to differentiate between patients with intense and less intense urges. Non-ordered thresholds further indicate that patients with less intense urges do not necessarily select less intense response options for items 9 and 10. The same is true for item 5 in adults and item 1 in both groups. Item 1 has a rather low item difficulty, indicating that most patients select 1 or 2 and only few patients select higher response options. Item 10 in both groups and item 1 in adults had low item-test correlations, indicating they did not reflect the construct measured by the rest of the scale (see Table 1).

Dimensions of the PUTS
Varimax rotated factor solutions across PUTS 1-8 items for minors and adults, respectively, showed two factors (see Table 2). Items 1, 6, 7 and 8 loaded on one factor (previously termed intensity items), items 2-5 (adults) and 3-5 (minors) loaded on one factor (quality items). Items 2 loaded on different underlying factors in minors and the adults.

Convergent validity between the PUTS and YGTSS scale
Spearman's rank correlations between the PUTS factors identified above and YGTSS subscales in minors and adults can be found in Table 3.

Discussion
Psychometric properties of the PUTS The primary aim of this study was to examine the psychometric properties according to CTT and IRT, and dimensions of the PUTS in minors and adults with GTS in a large sample. The second aim was to use these results to make clear recommendations for a revised PUTS. Key findings are summarized below.

Item difficulty and discrimination
Combining CTT and IRT, the statistical analyses confirm that item 10 of the PUTS (suppressibility of tics) has unacceptable thresholds/ item difficulty and discrimination parameters in minors and adults. The findings are in line with previous studies (Brandt, Beck, Sajin, Anders, et al., 2016;Reese et al., 2014;Woods et al., 2005). Item response theory-based analyses showed that thresholds were not ordered for items 1 and 9. Previous studies have already shown small inter-item (Raines et al., 2018) and item-test correlations (Brandt, Beck, Sajin, Anders, et al., 2016) between items 1 and 9 and the other items of the PUTS as well as another urge measure (Brandt, Beck, Sajin, Anders, et al., 2016). Overall, the results consistently suggest that the psychometric properties of these items are not ideal to assess urges (Brandt, Beck, Sajin, Anders, et al., 2016;Raines et al., 2018). Interestingly, thresholds indicated that item 5 should also be reviewed in the adult PUTS.
The results presented here support previous work which strongly suggests that the PUTS scale would benefit from the removal of item 10 (Capriotti, Brandt, Turkel, Lee, & Woods, 2014;Steinberg et al., 2010). Furthermore, items 1 and 9 should be rephrased in order to enhance their representation of the overall construct measured by the PUTS.

Dimensions of the PUTS
Previous research findings pointed towards three underlying dimensions (Brandt, Beck, Sajin, Anders, et al., 2016) of the PUTS 1-10 or two dimensions, if items 9 and 10 were excluded (Raines et al., 2018). The current study confirms the existence of two distinct dimensions within the PUTS 1-8 in both adults and minors.
In keeping with previous findings, one factor included items 2-5 in adults (Item 2: feeling pressure, item 3: feeling wound up or tense, item 4: 'not just right' feelings, item 5: feeling incompleteness) and items 3-5 in minors and can be called 'quality of premonitory sensations.' A second cluster consisted of items 6 (feeling of energy), 7 ('I have these feelings almost all the time before I do a tic') and 8 (feelings happen for every tic). This factor was originally termed the 'intensity factor' because it loaded on one factor with an independent measure (the 'real-time urge monitor') that assessed urge intensity over time (Brandt, Beck, Sajin, Anders, et al., 2016). However, it has since been suggested that the factor may be more accurately referred to as an 'urge frequency' factor (Raines et al., 2018). It is possible that urge intensity as measured by the real-time urge monitor (Brandt, Beck, Sajin, Baaske, et al., 2016) is highly correlated with urge frequency. Correlations with the YGTSS subscales did indeed show that the intensity/ frequency PUTS factor had its highest correlations with the YGTSS tic frequency and tic number subscales in minors, closely followed by tic intensity, while it seemed to have its highest correlations with tic complexity in adults. It is therefore unclear whether this factor represents the same underlying construct in minors and adults. Furthermore, it is unclear how well urge intensity and frequency can be distinguished. An additional limiting factor is that the results do not only depend on the psychometric properties of the PUTS but also the YGTSS'. As far as the authors are aware, the subscales of the YGTSS have also never been confirmed using factor analysis. Further research will be needed with instruments that can differentiate between urge intensity and frequency. We will therefore refer to the intensity/ frequency PUTS factor as urge severity for the moment.

Reliability and validity
One of the main criticisms levelled at the PUTS is a that it has previously been found to have poor psychometric properties in children 10 years and younger (Martino et al., 2017). However, in contrast to previous studies (Raines et al., 2018;Steinberg et al., 2010;Woods et al., 2005), our work revealed good internal consistency in children younger than 11. It is currently not clear why this is the case, but it is possible that having help from their parents increased reliability of the questionnaire. Future studies should determine the factors that increase reliability of the PUTS in young children in order to optimize study settings (e.g., filling out questionnaires together with a parent or clinician, at home or in the clinic).

The PUTS revised
Based on these findings, we would like to propose a number of changes to the PUTS scale. In general, we would like to propose two PUTS subscales: (1) an urge severity subscale and (2) an urge quality subscale (Appendix). Given the relatively small pool of patients with GTS in a given place, we would like to encourage colleagues in the field to help us evaluate and select items for a revised PUTS either by running their own validation studies or by getting in contact with us to collaborate, using our protocol.
The developers of the PUTS have already suggested to drop item 10 from the overall PUTS score (Reese et al., 2014;Woods et al., 2005); in the same vein, we would suggest to drop the item altogether if adjustments are made to revise the scale.
Regarding the structure of the scale, we would propose two changes: (1) change the scale to a 5-point Likert scale so that the scale has a mid-point (please see Appendix for a suggestion of the revised PUTS) and (2) the scale should range from 0 to 4 so that no symptoms or not agreeing with the item corresponds to the value 0 instead of 1. Furthermore, not all items should be phrased so that a higher number represents more severe urges. This helps participants to pay attention to the scale and can help identify participants who automatically tick the same response for each item, without reading the items.
Regarding the content of the PUTS, we would suggest testing several additional items using statistical techniques to establish a questionnaire with two dimensions (urge severity and urge quality) in which only items with good or excellent psychometric properties are included. Based on the existing literature, our expertise and the results of this study, we have suggested some additional items and rephrased some existing ones, in order to increase the reliability of the urge severity subscale (Appendix). Future research should investigate the reliability, validity and dimensions of the current version of the revised PUTS scale and exclude unsuitable items in order to create a final PUTS-R scale with excellent psychometric properties.
Regarding the urge quality subscale, we would propose to add more items, based on the literature regarding the different qualitative descriptions of urges (Banaschewski et al., 2003;Kurlan et al., 1989).
Future studies might compare and contrast comorbidities, such as OCD and attentiondeficit hyperactivity disorder (ADHD) and their possible relationship with different urge qualities in patients with tics (Brandt, Beck, Sajin, Anders, et al., 2016). Whether the urge quality items should be included in a total PUTS score or whether the two subscales should be viewed as entirely separate scales should also be investigated, using other measures of urge intensity and frequency in order to test the validity of the scale. Alternatively, the severity subscale might be a useful clinical indicator of urge intensity or frequency, while the quality subscale of premonitory urges might be viewed as qualitative information for clinicians and a research tool, rather than a severity index. Previous research suggests that there might be interesting associations between premonitory urges in general and obsessive-compulsive behaviourspecifically 'just-right' experiences (Sambrani et al., 2016). This association that might be captured by a particular PUTS quality item asking about just-right experiences (Brandt, Beck, Sajin, Anders, et al., 2016) and should be further investigated.
It should be noted that another version of the PUTS, the individualized PUTS (I-PUTS), exists (McGuire et al., 2016). This questionnaire assesses presence, frequency, intensity and body region of urges for each tic that a patient reports (symptom checklist parallel to the YGTSS). A strength of this questionnaire is that it is likely sensitive to change. Weaknesses include that it was not developed based on psychometric properties, and some psychometric properties have been assessed but convergent validity was low and that its length and score depends on the number of tics a patient reports.

Limitations
Comorbidities and the influence of medication were not analysed in the current study.
The factor structure was not tested in children <10 years independently because this group was not large enough. Whether the PUTS is suitable for children aged 10 and younger needs to be further explored.

Conflicts of interest
All authors declare no conflict of interest.

Data availability statement
The data that support the findings of this study are available on request from the corresponding author, if data sharing agreement is signed. The data are not publicly available due to ethical restrictions.